Rate Limiting

Controlling the rate of requests or operations to prevent abuse and ensure fair resource usage.

Core Idea

Rate limiting protects systems from overload by capping how many requests a client can make in a time window. Implementations include token bucket, sliding window, and leaky bucket algorithms.

Common Algorithms

Algorithm	Best for	Strength	Weakness
Fixed window	Simple APIs	Easy to implement	Burst at window edges
Sliding window log	Precise fairness	Accurate	Expensive at scale
Sliding window counter	Public APIs	Good accuracy/cost tradeoff	Approximation
Token bucket	User-facing traffic	Allows short bursts	Slightly harder to reason about
Leaky bucket	Smoothing background work	Stable output rate	Less flexible for bursts

Token bucket is the default choice for most production APIs. You refill tokens at a steady rate and spend one token per request. A bucket size larger than the refill rate allows short bursts without letting a client sustain abuse.

from time import monotonic
 
class TokenBucket:
    def __init__(self, rate_per_sec: float, capacity: int):
        self.rate = rate_per_sec
        self.capacity = capacity
        self.tokens = capacity
        self.updated_at = monotonic()
 
    def allow(self, cost: int = 1) -> bool:
        now = monotonic()
        elapsed = now - self.updated_at
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        self.updated_at = now
        if self.tokens < cost:
            return False
        self.tokens -= cost
        return True

What You Rate Limit

Requests per IP or API key
Login attempts per account and per subnet
Expensive work units, not just HTTP requests
Background jobs pulled from a queue
Internal dependencies where overload cascades downstream

The important design choice is the key. Rate limiting by IP is cheap but unfair behind NAT. Rate limiting by account is fairer but does not protect anonymous endpoints. Many systems combine both.

Distributed Rate Limiting

Single-process counters work only when traffic hits one node. Once the service scales horizontally, you need shared state or consistent routing.

Common approaches:

Store counters in Redis with expiry.
Use a token bucket in each node plus a global coarse limit.
Apply rate limiting at the edge gateway and keep stricter per-user controls in the app.

Failure mode to avoid: when the limiter backend fails open, abuse gets through; when it fails closed, healthy traffic is dropped. Decide that behavior explicitly.

Practical Rules

Return 429 Too Many Requests with Retry-After when the caller can back off.
Separate read limits, write limits, and auth limits.
Rate limit expensive endpoints more aggressively than cheap ones.
Track allow and deny counts per limiter key so you can see whether thresholds are sensible.
If overload is already happening, combine rate limiting with queues, timeouts, and backpressure.

Engineering Checklist

What exactly is being protected: CPU, DB, third-party API, login form?
Is burstiness acceptable or should traffic be smoothed?
Can one bad tenant starve everyone else?
What happens if the limiter store is unavailable?

Engineering Notes

Explorer

Rate Limiting

Rate Limiting

Core Idea

Common Algorithms

What You Rate Limit

Distributed Rate Limiting

Practical Rules

Engineering Checklist

Links

Graph View

Table of Contents

Backlinks