Indexing

How databases use index structures to make queries fast — B-trees, hash indexes, and their tradeoffs.

Core Idea

Without an index, a database must scan every row (O(n)). An index trades write performance and storage for dramatically faster reads (O(log n) for B-tree). Understanding indexes is essential for anyone writing code that touches a database.

The Default Index: B-tree

Most relational databases use B-trees for ordinary indexes because they preserve sorted order and support range queries.

B-tree indexes are good at:

  • Equality lookups: WHERE id = 42
  • Range lookups: WHERE created_at >= ...
  • Ordered scans: ORDER BY created_at
  • Prefix matches on composite indexes

This is why indexing connects directly to Binary Search and Trees and Binary Search Trees: the database is using a balanced tree so it can avoid scanning the whole table.

Other Common Index Types

IndexGood forWeakness
B-treeGeneral-purpose OLTP queriesExtra write cost
HashExact-match lookupsNo ordered/range scans
Full-textSearch over large text fieldsSpecialized ranking semantics
BitmapLow-cardinality analytics workloadsPoor fit for frequent writes

Composite Indexes

Order matters. An index on (country, city) helps:

  • WHERE country = 'EE'
  • WHERE country = 'EE' AND city = 'Tartu'

It does not help much for WHERE city = 'Tartu' by itself, because the leftmost prefix is missing.

Selectivity

Indexes pay off when they rule out most rows. A highly selective column like email is a great index candidate. A low-selectivity column like is_active is often a poor one unless combined with another column.

Costs People Forget

  • Every insert/update/delete must maintain the index
  • Each extra index consumes storage and memory
  • Too many overlapping indexes confuse schema ownership and slow writes

You should be able to explain what query each index exists for.

Practical Rule

Use EXPLAIN before and after adding an index. If you are not measuring query plans, you are guessing.

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at);

That index is good when the application asks, “show me one customer’s recent orders.” It is wasteful if no real query uses that access pattern.