Build a Hash Table in C
Goal: Implement a hash table with separate chaining from scratch in C. Support insert, lookup, and delete. Test with string keys.
Prerequisites: Hash Tables, C Language Essentials, Pointers and Memory, Memory Allocation
The Problem
We need a data structure that maps string keys to string values with O(1) average-case lookup. The approach: hash the key to an array index, handle collisions with linked lists (separate chaining).
key "alice" → hash("alice") → 738294 → 738294 % 16 → index 6
table[6] → ("alice", "engineer") → ("bob", "designer") → NULL
Step 1: Define the Data Structures
// hashtable.h
#ifndef HASHTABLE_H
#define HASHTABLE_H
#include <stddef.h>
typedef struct ht_entry {
char *key;
char *value;
struct ht_entry *next; // chaining for collisions
} ht_entry;
typedef struct {
ht_entry **buckets; // array of linked list heads
size_t capacity; // number of buckets
size_t size; // number of entries
} ht;
ht *ht_create(size_t capacity);
void ht_free(ht *table);
void ht_set(ht *table, const char *key, const char *value);
char *ht_get(ht *table, const char *key);
void ht_delete(ht *table, const char *key);
#endifKey insight: buckets is a pointer to an array of pointers. Each bucket is a linked list head (NULL if empty).
Step 2: Hash Function
A good hash function distributes keys uniformly across buckets. We’ll use DJB2 — simple, fast, good enough:
// hashtable.c
#include "hashtable.h"
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
static size_t hash(const char *key, size_t capacity) {
size_t h = 5381;
int c;
while ((c = *key++))
h = ((h << 5) + h) + c; // h * 33 + c
return h % capacity;
}Why DJB2 works
The magic number 33 (via h * 33 + c = h << 5 + h + c) gives good avalanche behavior — small changes in input produce large changes in output. The modulo maps the hash into bucket range.
When it breaks
Any hash function can degenerate if all keys hash to the same bucket. That’s why we need resizing (covered in exercises).
Step 3: Create and Free
ht *ht_create(size_t capacity) {
ht *table = malloc(sizeof(ht));
table->capacity = capacity;
table->size = 0;
table->buckets = calloc(capacity, sizeof(ht_entry *)); // all NULL
return table;
}
static void free_entry(ht_entry *entry) {
free(entry->key);
free(entry->value);
free(entry);
}
void ht_free(ht *table) {
for (size_t i = 0; i < table->capacity; i++) {
ht_entry *e = table->buckets[i];
while (e) {
ht_entry *next = e->next;
free_entry(e);
e = next;
}
}
free(table->buckets);
free(table);
}Note: calloc zero-initializes memory, so all bucket pointers start as NULL. We strdup keys and values (see set) so the table owns its own copies.
Step 4: Set (Insert or Update)
void ht_set(ht *table, const char *key, const char *value) {
size_t idx = hash(key, table->capacity);
ht_entry *e = table->buckets[idx];
// Check if key already exists — update it
while (e) {
if (strcmp(e->key, key) == 0) {
free(e->value);
e->value = strdup(value);
return;
}
e = e->next;
}
// Key not found — insert at head of chain
ht_entry *new = malloc(sizeof(ht_entry));
new->key = strdup(key);
new->value = strdup(value);
new->next = table->buckets[idx]; // prepend to chain
table->buckets[idx] = new;
table->size++;
}Inserting at the head of the chain is O(1). strdup allocates and copies the string so we own it.
Step 5: Get (Lookup)
char *ht_get(ht *table, const char *key) {
size_t idx = hash(key, table->capacity);
ht_entry *e = table->buckets[idx];
while (e) {
if (strcmp(e->key, key) == 0)
return e->value;
e = e->next;
}
return NULL; // not found
}Walk the chain at the bucket. Average chain length = size / capacity (the load factor).
Step 6: Delete
void ht_delete(ht *table, const char *key) {
size_t idx = hash(key, table->capacity);
ht_entry *e = table->buckets[idx];
ht_entry *prev = NULL;
while (e) {
if (strcmp(e->key, key) == 0) {
if (prev)
prev->next = e->next;
else
table->buckets[idx] = e->next;
free_entry(e);
table->size--;
return;
}
prev = e;
e = e->next;
}
}Standard linked list removal. Need prev to unlink the node.
Step 7: Test It
// main.c
#include "hashtable.h"
#include <stdio.h>
#include <assert.h>
int main(void) {
ht *table = ht_create(16);
ht_set(table, "alice", "engineer");
ht_set(table, "bob", "designer");
ht_set(table, "carol", "manager");
assert(strcmp(ht_get(table, "alice"), "engineer") == 0);
assert(strcmp(ht_get(table, "bob"), "designer") == 0);
assert(ht_get(table, "dave") == NULL);
// Update existing key
ht_set(table, "alice", "CTO");
assert(strcmp(ht_get(table, "alice"), "CTO") == 0);
// Delete
ht_delete(table, "bob");
assert(ht_get(table, "bob") == NULL);
printf("All tests passed! (size=%zu)\n", table->size);
ht_free(table);
return 0;
}Build and run
gcc -Wall -Wextra -g -o hashtable main.c hashtable.c
./hashtable
# All tests passed! (size=2)
# Check for leaks
valgrind --leak-check=full ./hashtable
# Should report: All heap blocks were freed -- no leaks are possibleVerify: Load Distribution
// Add this to main.c to see bucket distribution
void ht_dump(ht *table) {
for (size_t i = 0; i < table->capacity; i++) {
int count = 0;
for (ht_entry *e = table->buckets[i]; e; e = e->next)
count++;
if (count > 0)
printf("bucket[%zu]: %d entries\n", i, count);
}
}With 16 buckets and 3 entries, most buckets are empty and the populated ones have 1 entry each. This is good — short chains mean fast lookups.
Exercises
-
Resizing: When
size / capacity > 0.75, double the capacity and rehash all entries. Add this toht_set. Test by inserting 1000 entries into a table that starts with capacity 4. -
Iterator: Implement
ht_entry *ht_next(ht *table, size_t *bucket, ht_entry **current)that iterates over all entries. Use it to print all key-value pairs. -
Open addressing: Replace separate chaining with linear probing. What changes in set/get/delete? What happens when the table is nearly full?
-
Benchmark: Insert 100,000 random strings into your hash table. Time it against a sorted array with binary search. When does the hash table win?
Next: 02 - Implement Merge Sort in C — divide and conquer with O(n log n) guaranteed.