Lock-Free Data Structures in C
Goal: Build lock-free stack and queue using C11 atomics. Understand why locks are problematic for concurrency, how CAS (compare-and-swap) enables lock-freedom, and implement real lock-free data structures.
Based on: jfalkner/lockfree, Lock-free Data Structures by Siddharth S2710
Prerequisites: C pointers, struct, basic concurrency concepts, understand what atomics are
Why Lock-Free?
Locks (mutexes) have problems:
- Contention: Threads block each other, killing parallelism
- Deadlocks: Circular wait dependencies = stuck forever
- Priority inversion: Low-priority thread holds lock, high-priority can’t run
- Scheduling: OS decides when blocked threads run — unpredictable latency
Lock-free: at least one thread makes progress every step (no thread can be blocked forever)
Wait-free: every thread completes in bounded steps (stronger guarantee)
The Fundamental Operation: CAS
Compare-and-Swap (CAS) is the building block:
#include <stdatomic.h>
// CAS: if *ptr == old_val, set *ptr = new_val, return true
// Otherwise, return false and *ptr unchanged
bool cas(int *ptr, int old_val, int new_val) {
return atomic_compare_exchange_strong(ptr, old_val, new_val);
}In C11 atomics:
atomic_int x = ATOMIC_VAR_INIT(0);
int old = atomic_load(&x);
// Try to change x from 0 to 42
if (atomic_compare_exchange_strong(&x, &old, 42)) {
// Success: x is now 42
} else {
// Failure: x was modified by another thread, old now has current value
}Key insight: CAS is a load-modify-store operation that happens atomically. If another thread modifies the value between load and store, CAS fails.
Lock-Free Stack (Treiber Stack)
The simplest lock-free data structure. Uses linked nodes, CAS to push/pop.
#include <stdatomic.h>
#include <stdlib.h>
#include <stdio.h>
typedef struct Node {
void *data;
struct Node *next;
} Node;
typedef struct {
_Atomic(Node*) head;
} LockFreeStack;
// Initialize stack
void stack_init(LockFreeStack *s) {
atomic_store(&s->head, NULL);
}
// Push: create node, point to current head, CAS to install
// Returns true on success
bool stack_push(LockFreeStack *s, void *data) {
Node *node = malloc(sizeof(Node));
if (!node) return false;
node->data = data;
// Current head
Node *old_head = atomic_load(&s->head);
// Point new node to current head
node->next = old_head;
// Try to install new node as head
// If head changed (another thread pushed), retry with new head
while (!atomic_compare_exchange_strong(&s->head, &old_head, node)) {
// CAS failed, old_head now has current head value
node->next = old_head;
// Loop will retry with updated old_head
}
return true;
}
// Pop: try to CAS head from old to old->next
void *stack_pop(LockFreeStack *s) {
Node *old_head = atomic_load(&s->head);
while (old_head != NULL) {
if (atomic_compare_exchange_strong(&s->head, &old_head, old_head->next)) {
void *data = old_head->data;
free(old_head);
return data;
}
// CAS failed, another thread modified head — retry
}
return NULL; // Empty stack
}ABA Problem: CAS only compares values, not pointers. If node A is at head, popped, then a new node with same address as A is pushed, CAS sees “A” and succeeds — even though the linked structure changed.
Solution: Use a tagged pointer (add a counter to the pointer):
typedef struct {
Node *ptr;
unsigned long tag;
} TaggedPtr;
typedef struct {
_Atomic TaggedPtr head;
} LockFreeStackTagged;Lock-Free Queue (Michael-Scott)
More complex than stack. Uses dummy head node for cleaner implementation.
#include <stdatomic.h>
#include <stdlib.h>
typedef struct Node {
void *data;
struct Node *next;
} Node;
typedef struct {
_Atomic(Node*) head;
_Atomic(Node*) tail;
} LockFreeQueue;
void queue_init(LockFreeQueue *q) {
// Create dummy node — head always points to dummy
Node *dummy = malloc(sizeof(Node));
dummy->next = NULL;
atomic_store(&q->head, dummy);
atomic_store(&q->tail, dummy);
}
bool enqueue(LockFreeQueue *q, void *data) {
Node *node = malloc(sizeof(Node));
if (!node) return false;
node->data = data;
node->next = NULL;
Node *tail = atomic_load(&q->tail);
Node *next = atomic_load(&tail->next);
while (1) {
// If tail is behind, advance it
if (next != NULL) {
atomic_compare_exchange_strong(&q->tail, &tail, next);
tail = atomic_load(&q->tail);
next = atomic_load(&tail->next);
continue;
}
// Try to append node at tail
if (atomic_compare_exchange_strong(&tail->next, &next, node)) {
// Success: node is appended
// Advance tail to new node
atomic_compare_exchange_strong(&q->tail, &tail, node);
return true;
}
// CAS failed, retry
}
}
void *dequeue(LockFreeQueue *q) {
Node *head = atomic_load(&q->head);
Node *tail = atomic_load(&q->tail);
Node *next = atomic_load(&head->next);
while (head != NULL) {
// Empty queue: head == tail and no next
if (head == tail && next == NULL) {
return NULL;
}
// Dummy node at head — skip it
if (head == tail && next != NULL) {
atomic_compare_exchange_strong(&q->tail, &tail, next);
head = atomic_load(&q->head);
next = atomic_load(&head->next);
continue;
}
// Try to advance head
if (next != NULL) {
void *data = next->data;
if (atomic_compare_exchange_strong(&q->head, &head, next)) {
free(head); // Only free old head (dummy)
return data;
}
}
// CAS failed, retry
head = atomic_load(&q->head);
tail = atomic_load(&q->tail);
next = head ? atomic_load(&head->next) : NULL;
}
return NULL;
}Key insight: The Michael-Scott queue uses a dummy node so both enqueue and dequeue only modify head or tail (not arbitrary nodes in the middle). This simplifies reasoning about correctness.
Memory Reclamation
Lock-free doesn’t mean memory-safe. You still need to free nodes.
Problem: A thread might be pointing to a node that another thread just freed.
Solutions:
- Hazard pointers: Each thread publishes which nodes it’s “looking at.” No node with hazard pointers is freed until hazard count drops to zero.
typedef struct {
void *ptr;
struct HazardPointer *next;
} HazardPointer;
_Thread_local HazardPointer *hazards = NULL; // Per-thread hazard list
// Before reading node, publish hazard
void publish_hazard(void **slot, Node *node) {
// Set slot to node
*slot = node;
// Memory barrier to ensure node data is visible
atomic_thread_fence(memory_order_acquire);
}
// After done, clear hazard
void clear_hazard(void **slot) {
*slot = NULL;
}-
Reference counting: Each node has atomic refcount. When refcount hits 0, safe to free.
-
Epoch-based reclamation: Divide time into epochs. Threads only free nodes after an epoch ends and no other thread is in that epoch.
Testing Lock-Free Structures
#include <pthread.h>
#include <assert.h>
LockFreeStack stack;
#define NUM_THREADS 4
#define OPS_PER_THREAD 10000
void *producer(void *arg) {
for (int i = 0; i < OPS_PER_THREAD; i++) {
int *val = malloc(sizeof(int));
*val = i;
stack_push(&stack, val);
}
return NULL;
}
void *consumer(void *arg) {
for (int i = 0; i < OPS_PER_THREAD; i++) {
int *val = stack_pop(&stack);
// val should not be NULL unless stack is empty
}
return NULL;
}
int main() {
stack_init(&stack);
pthread_t threads[NUM_THREADS * 2];
// Half producers, half consumers
for (int i = 0; i < NUM_THREADS; i++) {
pthread_create(&threads[i], NULL, producer, NULL);
pthread_create(&threads[NUM_THREADS + i], NULL, consumer, NULL);
}
for (int i = 0; i < NUM_THREADS * 2; i++) {
pthread_join(threads[i], NULL);
}
printf("Test complete\n");
return 0;
}Compile with: gcc -std=c11 -pthread lockfree.c -o lockfree
When to Use Lock-Free
Good for:
- High-contention data structures (many threads, frequent access)
- Real-time systems (predictable latency, no blocking)
- CAS-heavy workloads (low contention = fast)
Bad for:
- Low contention (locks are simpler, faster)
- Complex data structures (CAS becomes painful)
- When simplicity matters (locks are easier to reason about)
Lock-free is harder to implement correctly. Start with locks, optimize later if profiling shows contention.
Exercises
- Implement a lock-free deque — double-ended queue with push_front, push_back, pop_front, pop_back
- Add hazard pointers to the stack to handle memory reclamation
- Implement a bounded MPMC queue — multi-producer multi-consumer with fixed size
- Compare performance — implement both locked and lock-free versions, benchmark with varying thread counts
See Also
- 06 - Build a Thread Pool in C — uses locks, contrasts with lock-free approach
- Concurrency and Synchronization — covers mutexes, condition variables for comparison
- jfalkner/lockfree — thread-safe, lock-free data structures in C with atomics
- Lock-free Data Structures — C++ implementations of common lock-free patterns