Lock-Free Data Structures in C

Goal: Build lock-free stack and queue using C11 atomics. Understand why locks are problematic for concurrency, how CAS (compare-and-swap) enables lock-freedom, and implement real lock-free data structures.

Based on: jfalkner/lockfree, Lock-free Data Structures by Siddharth S2710

Prerequisites: C pointers, struct, basic concurrency concepts, understand what atomics are


Why Lock-Free?

Locks (mutexes) have problems:

  • Contention: Threads block each other, killing parallelism
  • Deadlocks: Circular wait dependencies = stuck forever
  • Priority inversion: Low-priority thread holds lock, high-priority can’t run
  • Scheduling: OS decides when blocked threads run — unpredictable latency

Lock-free: at least one thread makes progress every step (no thread can be blocked forever)

Wait-free: every thread completes in bounded steps (stronger guarantee)


The Fundamental Operation: CAS

Compare-and-Swap (CAS) is the building block:

#include <stdatomic.h>
 
// CAS: if *ptr == old_val, set *ptr = new_val, return true
// Otherwise, return false and *ptr unchanged
bool cas(int *ptr, int old_val, int new_val) {
    return atomic_compare_exchange_strong(ptr, old_val, new_val);
}

In C11 atomics:

atomic_int x = ATOMIC_VAR_INIT(0);
int old = atomic_load(&x);
 
// Try to change x from 0 to 42
if (atomic_compare_exchange_strong(&x, &old, 42)) {
    // Success: x is now 42
} else {
    // Failure: x was modified by another thread, old now has current value
}

Key insight: CAS is a load-modify-store operation that happens atomically. If another thread modifies the value between load and store, CAS fails.


Lock-Free Stack (Treiber Stack)

The simplest lock-free data structure. Uses linked nodes, CAS to push/pop.

#include <stdatomic.h>
#include <stdlib.h>
#include <stdio.h>
 
typedef struct Node {
    void *data;
    struct Node *next;
} Node;
 
typedef struct {
    _Atomic(Node*) head;
} LockFreeStack;
 
// Initialize stack
void stack_init(LockFreeStack *s) {
    atomic_store(&s->head, NULL);
}
 
// Push: create node, point to current head, CAS to install
// Returns true on success
bool stack_push(LockFreeStack *s, void *data) {
    Node *node = malloc(sizeof(Node));
    if (!node) return false;
    
    node->data = data;
    
    // Current head
    Node *old_head = atomic_load(&s->head);
    
    // Point new node to current head
    node->next = old_head;
    
    // Try to install new node as head
    // If head changed (another thread pushed), retry with new head
    while (!atomic_compare_exchange_strong(&s->head, &old_head, node)) {
        // CAS failed, old_head now has current head value
        node->next = old_head;
        // Loop will retry with updated old_head
    }
    
    return true;
}
 
// Pop: try to CAS head from old to old->next
void *stack_pop(LockFreeStack *s) {
    Node *old_head = atomic_load(&s->head);
    
    while (old_head != NULL) {
        if (atomic_compare_exchange_strong(&s->head, &old_head, old_head->next)) {
            void *data = old_head->data;
            free(old_head);
            return data;
        }
        // CAS failed, another thread modified head — retry
    }
    
    return NULL; // Empty stack
}

ABA Problem: CAS only compares values, not pointers. If node A is at head, popped, then a new node with same address as A is pushed, CAS sees “A” and succeeds — even though the linked structure changed.

Solution: Use a tagged pointer (add a counter to the pointer):

typedef struct {
    Node *ptr;
    unsigned long tag;
} TaggedPtr;
 
typedef struct {
    _Atomic TaggedPtr head;
} LockFreeStackTagged;

Lock-Free Queue (Michael-Scott)

More complex than stack. Uses dummy head node for cleaner implementation.

#include <stdatomic.h>
#include <stdlib.h>
 
typedef struct Node {
    void *data;
    struct Node *next;
} Node;
 
typedef struct {
    _Atomic(Node*) head;
    _Atomic(Node*) tail;
} LockFreeQueue;
 
void queue_init(LockFreeQueue *q) {
    // Create dummy node — head always points to dummy
    Node *dummy = malloc(sizeof(Node));
    dummy->next = NULL;
    
    atomic_store(&q->head, dummy);
    atomic_store(&q->tail, dummy);
}
 
bool enqueue(LockFreeQueue *q, void *data) {
    Node *node = malloc(sizeof(Node));
    if (!node) return false;
    node->data = data;
    node->next = NULL;
    
    Node *tail = atomic_load(&q->tail);
    Node *next = atomic_load(&tail->next);
    
    while (1) {
        // If tail is behind, advance it
        if (next != NULL) {
            atomic_compare_exchange_strong(&q->tail, &tail, next);
            tail = atomic_load(&q->tail);
            next = atomic_load(&tail->next);
            continue;
        }
        
        // Try to append node at tail
        if (atomic_compare_exchange_strong(&tail->next, &next, node)) {
            // Success: node is appended
            // Advance tail to new node
            atomic_compare_exchange_strong(&q->tail, &tail, node);
            return true;
        }
        
        // CAS failed, retry
    }
}
 
void *dequeue(LockFreeQueue *q) {
    Node *head = atomic_load(&q->head);
    Node *tail = atomic_load(&q->tail);
    Node *next = atomic_load(&head->next);
    
    while (head != NULL) {
        // Empty queue: head == tail and no next
        if (head == tail && next == NULL) {
            return NULL;
        }
        
        // Dummy node at head — skip it
        if (head == tail && next != NULL) {
            atomic_compare_exchange_strong(&q->tail, &tail, next);
            head = atomic_load(&q->head);
            next = atomic_load(&head->next);
            continue;
        }
        
        // Try to advance head
        if (next != NULL) {
            void *data = next->data;
            if (atomic_compare_exchange_strong(&q->head, &head, next)) {
                free(head); // Only free old head (dummy)
                return data;
            }
        }
        
        // CAS failed, retry
        head = atomic_load(&q->head);
        tail = atomic_load(&q->tail);
        next = head ? atomic_load(&head->next) : NULL;
    }
    
    return NULL;
}

Key insight: The Michael-Scott queue uses a dummy node so both enqueue and dequeue only modify head or tail (not arbitrary nodes in the middle). This simplifies reasoning about correctness.


Memory Reclamation

Lock-free doesn’t mean memory-safe. You still need to free nodes.

Problem: A thread might be pointing to a node that another thread just freed.

Solutions:

  1. Hazard pointers: Each thread publishes which nodes it’s “looking at.” No node with hazard pointers is freed until hazard count drops to zero.
typedef struct {
    void *ptr;
    struct HazardPointer *next;
} HazardPointer;
 
_Thread_local HazardPointer *hazards = NULL; // Per-thread hazard list
 
// Before reading node, publish hazard
void publish_hazard(void **slot, Node *node) {
    // Set slot to node
    *slot = node;
    // Memory barrier to ensure node data is visible
    atomic_thread_fence(memory_order_acquire);
}
 
// After done, clear hazard
void clear_hazard(void **slot) {
    *slot = NULL;
}
  1. Reference counting: Each node has atomic refcount. When refcount hits 0, safe to free.

  2. Epoch-based reclamation: Divide time into epochs. Threads only free nodes after an epoch ends and no other thread is in that epoch.


Testing Lock-Free Structures

#include <pthread.h>
#include <assert.h>
 
LockFreeStack stack;
#define NUM_THREADS 4
#define OPS_PER_THREAD 10000
 
void *producer(void *arg) {
    for (int i = 0; i < OPS_PER_THREAD; i++) {
        int *val = malloc(sizeof(int));
        *val = i;
        stack_push(&stack, val);
    }
    return NULL;
}
 
void *consumer(void *arg) {
    for (int i = 0; i < OPS_PER_THREAD; i++) {
        int *val = stack_pop(&stack);
        // val should not be NULL unless stack is empty
    }
    return NULL;
}
 
int main() {
    stack_init(&stack);
    
    pthread_t threads[NUM_THREADS * 2];
    
    // Half producers, half consumers
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_create(&threads[i], NULL, producer, NULL);
        pthread_create(&threads[NUM_THREADS + i], NULL, consumer, NULL);
    }
    
    for (int i = 0; i < NUM_THREADS * 2; i++) {
        pthread_join(threads[i], NULL);
    }
    
    printf("Test complete\n");
    return 0;
}

Compile with: gcc -std=c11 -pthread lockfree.c -o lockfree


When to Use Lock-Free

Good for:

  • High-contention data structures (many threads, frequent access)
  • Real-time systems (predictable latency, no blocking)
  • CAS-heavy workloads (low contention = fast)

Bad for:

  • Low contention (locks are simpler, faster)
  • Complex data structures (CAS becomes painful)
  • When simplicity matters (locks are easier to reason about)

Lock-free is harder to implement correctly. Start with locks, optimize later if profiling shows contention.


Exercises

  1. Implement a lock-free deque — double-ended queue with push_front, push_back, pop_front, pop_back
  2. Add hazard pointers to the stack to handle memory reclamation
  3. Implement a bounded MPMC queue — multi-producer multi-consumer with fixed size
  4. Compare performance — implement both locked and lock-free versions, benchmark with varying thread counts

See Also