Hash Tables
A hash table maps keys to values using a hash function. The hash function converts a key into an array index, giving O(1) average-case lookup, insert, and delete.
Why It Matters
Hash tables are the single most useful data structure in practice. Python dict, JavaScript Object/Map, Go map, Java HashMap — they’re everywhere. Understanding how they work helps you use them correctly and debug performance issues.
How It Works
key → hash_function(key) → index → array[index] = value
Example:
"alice" → hash("alice") → 738294 → 738294 % 8 → 6
table[6] = "alice's data"
Operations
| Operation | Average | Worst Case |
|---|---|---|
| Get | O(1) | O(n) |
| Set | O(1) | O(n) |
| Delete | O(1) | O(n) |
| Contains | O(1) | O(n) |
Worst case happens when all keys hash to the same bucket (hash collision). Good hash functions and resizing keep this rare.
Collision Resolution
| Strategy | How It Works |
|---|---|
| Chaining | Each bucket holds a linked list of entries |
| Open addressing | On collision, probe the next empty slot |
| Robin Hood | Open addressing + steal from rich buckets |
Python dicts use open addressing with a custom probing strategy.
Code Example
# Python dict IS a hash table
ages = {"alice": 30, "bob": 25, "carol": 35}
# O(1) lookup
print(ages["alice"]) # 30
print(ages.get("dave")) # None (no KeyError)
# O(1) insert
ages["dave"] = 28
# O(1) delete
del ages["bob"]
# O(1) membership check
print("carol" in ages) # True
# Common pattern: counting frequencies
def count_chars(s: str) -> dict:
freq = {}
for ch in s:
freq[ch] = freq.get(ch, 0) + 1
return freq
assert count_chars("hello") == {'h': 1, 'e': 1, 'l': 2, 'o': 1}
# Two-sum using hash table: O(n) vs O(n^2) brute force
def two_sum(nums: list[int], target: int) -> tuple:
seen = {}
for i, n in enumerate(nums):
complement = target - n
if complement in seen:
return (seen[complement], i)
seen[n] = i
return ()
assert two_sum([2, 7, 11, 15], 9) == (0, 1)Load Factor and Resizing
Load factor = number of entries / number of buckets. When it exceeds a threshold (Python uses ~0.66), the table doubles in size and rehashes all entries. This is O(n) but happens rarely, so inserts are O(1) amortized.
Sets
A set is just a hash table with only keys (no values). Same O(1) operations.
seen = set()
seen.add(5)
seen.add(5) # no effect, already present
print(len(seen)) # 1
print(5 in seen) # True, O(1)Related
- Arrays — underlying storage for the table
- Linked Lists — used in chaining collision resolution
- Big O Notation — amortized analysis
- Trees and Binary Search Trees — ordered alternative (O(log n) but sorted)