Indexing
How databases use index structures to make queries fast — B-trees, hash indexes, and their tradeoffs.
Core Idea
Without an index, a database must scan every row (O(n)). An index trades write performance and storage for dramatically faster reads (O(log n) for B-tree). Understanding indexes is essential for anyone writing code that touches a database.
The Default Index: B-tree
Most relational databases use B-trees for ordinary indexes because they preserve sorted order and support range queries.
B-tree indexes are good at:
- Equality lookups:
WHERE id = 42 - Range lookups:
WHERE created_at >= ... - Ordered scans:
ORDER BY created_at - Prefix matches on composite indexes
This is why indexing connects directly to Binary Search and Trees and Binary Search Trees: the database is using a balanced tree so it can avoid scanning the whole table.
Other Common Index Types
| Index | Good for | Weakness |
|---|---|---|
| B-tree | General-purpose OLTP queries | Extra write cost |
| Hash | Exact-match lookups | No ordered/range scans |
| Full-text | Search over large text fields | Specialized ranking semantics |
| Bitmap | Low-cardinality analytics workloads | Poor fit for frequent writes |
Composite Indexes
Order matters. An index on (country, city) helps:
WHERE country = 'EE'WHERE country = 'EE' AND city = 'Tartu'
It does not help much for WHERE city = 'Tartu' by itself, because the leftmost prefix is missing.
Selectivity
Indexes pay off when they rule out most rows. A highly selective column like email is a great index candidate. A low-selectivity column like is_active is often a poor one unless combined with another column.
Costs People Forget
- Every insert/update/delete must maintain the index
- Each extra index consumes storage and memory
- Too many overlapping indexes confuse schema ownership and slow writes
You should be able to explain what query each index exists for.
Practical Rule
Use EXPLAIN before and after adding an index. If you are not measuring query plans, you are guessing.
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at);That index is good when the application asks, “show me one customer’s recent orders.” It is wasteful if no real query uses that access pattern.
Links
- Binary Search — search in ordered structures
- Trees and Binary Search Trees — the data structure intuition behind B-trees