Build a Thread Pool in C

Goal: Implement a fixed-size thread pool with a work queue using pthreads, mutexes, and condition variables. Submit tasks and process them concurrently.

Prerequisites: Concurrency and Synchronization, Processes and Threads, Memory Allocation


The Problem

Creating a new thread per task is expensive (stack allocation, kernel overhead). A thread pool creates N worker threads once, then feeds them work through a shared queue. This is how web servers, databases, and game engines handle concurrent work.

Main thread:                    Worker threads:
  submit(task_a) ──→ [queue] ──→ thread 0: executes task_a
  submit(task_b) ──→ [queue] ──→ thread 1: executes task_b
  submit(task_c) ──→ [queue] ──→ thread 0: executes task_c (after task_a finishes)

Step 1: Data Structures

// threadpool.h
#ifndef THREADPOOL_H
#define THREADPOOL_H
 
#include <pthread.h>
#include <stdbool.h>
 
typedef void (*task_fn)(void *arg);
 
typedef struct task {
    task_fn fn;
    void *arg;
    struct task *next;
} task;
 
typedef struct {
    pthread_t *threads;
    int nthreads;
    task *head;           // queue front (dequeue here)
    task *tail;           // queue back (enqueue here)
    pthread_mutex_t lock;
    pthread_cond_t  notify;
    bool shutdown;
} threadpool;
 
threadpool *threadpool_create(int nthreads);
void        threadpool_submit(threadpool *pool, task_fn fn, void *arg);
void        threadpool_destroy(threadpool *pool);
 
#endif

Step 2: Worker Thread Function

Each worker loops: lock → wait for work → dequeue → unlock → execute.

// threadpool.c
#include "threadpool.h"
#include <stdlib.h>
#include <stdio.h>
 
static void *worker(void *arg) {
    threadpool *pool = arg;
 
    while (1) {
        pthread_mutex_lock(&pool->lock);
 
        // Wait until there's work or shutdown
        while (pool->head == NULL && !pool->shutdown)
            pthread_cond_wait(&pool->notify, &pool->lock);
 
        if (pool->shutdown && pool->head == NULL) {
            pthread_mutex_unlock(&pool->lock);
            break;
        }
 
        // Dequeue a task
        task *t = pool->head;
        pool->head = t->next;
        if (pool->head == NULL)
            pool->tail = NULL;
 
        pthread_mutex_unlock(&pool->lock);
 
        // Execute outside the lock — don't hold the lock during work!
        t->fn(t->arg);
        free(t);
    }
 
    return NULL;
}

Why while and not if

pthread_cond_wait can wake up spuriously (without a signal). The while loop re-checks the condition. This is the standard condition variable pattern.


Step 3: Create and Destroy

threadpool *threadpool_create(int nthreads) {
    threadpool *pool = calloc(1, sizeof(threadpool));
    pool->nthreads = nthreads;
    pool->threads = malloc(nthreads * sizeof(pthread_t));
    pthread_mutex_init(&pool->lock, NULL);
    pthread_cond_init(&pool->notify, NULL);
 
    for (int i = 0; i < nthreads; i++)
        pthread_create(&pool->threads[i], NULL, worker, pool);
 
    return pool;
}
 
void threadpool_destroy(threadpool *pool) {
    pthread_mutex_lock(&pool->lock);
    pool->shutdown = true;
    pthread_cond_broadcast(&pool->notify);   // wake ALL waiting workers
    pthread_mutex_unlock(&pool->lock);
 
    for (int i = 0; i < pool->nthreads; i++)
        pthread_join(pool->threads[i], NULL);
 
    // Free any remaining tasks
    task *t = pool->head;
    while (t) {
        task *next = t->next;
        free(t);
        t = next;
    }
 
    pthread_mutex_destroy(&pool->lock);
    pthread_cond_destroy(&pool->notify);
    free(pool->threads);
    free(pool);
}

Why broadcast, not signal

pthread_cond_signal wakes one thread. On shutdown, we need all workers to wake up and check the shutdown flag. broadcast wakes them all.


Step 4: Submit Work

void threadpool_submit(threadpool *pool, task_fn fn, void *arg) {
    task *t = malloc(sizeof(task));
    t->fn = fn;
    t->arg = arg;
    t->next = NULL;
 
    pthread_mutex_lock(&pool->lock);
    if (pool->tail)
        pool->tail->next = t;
    else
        pool->head = t;
    pool->tail = t;
    pthread_cond_signal(&pool->notify);   // wake one idle worker
    pthread_mutex_unlock(&pool->lock);
}

Step 5: Test It

// main.c
#include "threadpool.h"
#include <stdio.h>
#include <unistd.h>
 
void compute(void *arg) {
    int id = *(int *)arg;
    printf("[thread %lu] task %d start\n", pthread_self() % 1000, id);
    usleep(100000);   // simulate 100ms of work
    printf("[thread %lu] task %d done\n", pthread_self() % 1000, id);
    free(arg);
}
 
int main(void) {
    threadpool *pool = threadpool_create(4);
 
    // Submit 20 tasks to 4 workers
    for (int i = 0; i < 20; i++) {
        int *id = malloc(sizeof(int));
        *id = i;
        threadpool_submit(pool, compute, id);
    }
 
    sleep(1);   // let tasks complete
    threadpool_destroy(pool);
    printf("Pool destroyed, all done.\n");
    return 0;
}
gcc -Wall -Wextra -g -pthread -o pool main.c threadpool.c
./pool
# You'll see 4 tasks running concurrently, then the next 4, etc.
 
valgrind --tool=helgrind ./pool   # check for data races

Exercises

  1. Wait for completion: Add threadpool_wait(pool) that blocks until all submitted tasks finish. Hint: track pending task count with a counter + condition variable.

  2. Futures/results: Modify submit to return a future handle. Add future_get(future) that blocks until the task completes and returns the result.

  3. Dynamic sizing: Add threadpool_resize(pool, new_size) that adds or removes worker threads while the pool is running.

  4. Benchmark: Submit 100,000 lightweight tasks (increment a shared atomic counter). Compare 1, 2, 4, 8 threads. Graph throughput vs thread count. Where does it plateau?


Next: 07 - Bare Metal Blinky on STM32 — cross the hardware boundary.