Hash Tables

A hash table maps keys to values using a hash function. The hash function converts a key into an array index, giving O(1) average-case lookup, insert, and delete.

Why It Matters

Hash tables are the single most useful data structure in practice. Python dict, JavaScript Object/Map, Go map, Java HashMap — they’re everywhere. Understanding how they work helps you use them correctly and debug performance issues.

How It Works

key → hash_function(key) → index → array[index] = value

Example:
  "alice" → hash("alice") → 738294 → 738294 % 8 → 6
  table[6] = "alice's data"

Operations

OperationAverageWorst Case
GetO(1)O(n)
SetO(1)O(n)
DeleteO(1)O(n)
ContainsO(1)O(n)

Worst case happens when all keys hash to the same bucket (hash collision). Good hash functions and resizing keep this rare.

Collision Resolution

StrategyHow It Works
ChainingEach bucket holds a linked list of entries
Open addressingOn collision, probe the next empty slot
Robin HoodOpen addressing + steal from rich buckets

Python dicts use open addressing with a custom probing strategy.

Code Example

# Python dict IS a hash table
ages = {"alice": 30, "bob": 25, "carol": 35}
 
# O(1) lookup
print(ages["alice"])     # 30
print(ages.get("dave"))  # None (no KeyError)
 
# O(1) insert
ages["dave"] = 28
 
# O(1) delete
del ages["bob"]
 
# O(1) membership check
print("carol" in ages)   # True
 
# Common pattern: counting frequencies
def count_chars(s: str) -> dict:
    freq = {}
    for ch in s:
        freq[ch] = freq.get(ch, 0) + 1
    return freq
 
assert count_chars("hello") == {'h': 1, 'e': 1, 'l': 2, 'o': 1}
 
# Two-sum using hash table: O(n) vs O(n^2) brute force
def two_sum(nums: list[int], target: int) -> tuple:
    seen = {}
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return (seen[complement], i)
        seen[n] = i
    return ()
 
assert two_sum([2, 7, 11, 15], 9) == (0, 1)

Load Factor and Resizing

Load factor = number of entries / number of buckets. When it exceeds a threshold (Python uses ~0.66), the table doubles in size and rehashes all entries. This is O(n) but happens rarely, so inserts are O(1) amortized.

Sets

A set is just a hash table with only keys (no values). Same O(1) operations.

seen = set()
seen.add(5)
seen.add(5)      # no effect, already present
print(len(seen)) # 1
print(5 in seen)  # True, O(1)