Scheduling
The OS scheduler decides which process or thread runs on which CPU core, and for how long. It must balance throughput, latency, fairness, and responsiveness — often conflicting goals.
Why It Matters
Scheduling determines whether your interactive app feels snappy or laggy, whether a batch job finishes fast, and whether real-time deadlines are met. Understanding schedulers helps you reason about tail latency, thread pool sizing, and CPU affinity.
Scheduling Algorithms
| Algorithm | How It Works | Preemptive? | Tradeoff |
|---|---|---|---|
| FIFO/FCFS | Run to completion in arrival order | No | Simple, but one CPU-hog blocks everyone |
| SJF | Shortest job first | No | Optimal avg wait, but must predict job length |
| Round Robin | Fixed time slice, rotate queue | Yes | Fair, but context switch overhead |
| Priority | Highest priority runs first | Yes | Responsive, but can starve low priority |
| MLFQ | Multiple RR queues, demote CPU-hogs | Yes | Good general purpose (BSD, Windows) |
| CFS | Virtual runtime tree, pick smallest | Yes | Fair, proportional share (Linux default) |
Round Robin Detail
Time slice = 10ms
Process A: ████────── (runs 10ms, preempted)
Process B: ────████── (runs 10ms, preempted)
Process C: ────────██ (runs 5ms, done)
Process A: ██████──── (continues)
Shorter time slices = better responsiveness, more context switch overhead. Typical values: 1-10ms.
MLFQ (Multi-Level Feedback Queue)
Queue 0 (highest priority, 1ms slice) ← new tasks start here
Queue 1 (medium priority, 4ms slice) ← demoted if uses full slice
Queue 2 (lowest priority, 16ms slice) ← CPU-bound tasks land here
Rules:
- New tasks enter highest queue
- If a task uses its full time slice, demote it
- If a task blocks before its slice expires, keep it in the same queue
- Periodically boost all tasks to highest queue (prevent starvation)
Interactive tasks (short CPU bursts, lots of IO) stay in high queues. CPU-bound tasks sink to low queues.
Context Switch Cost
A context switch saves/restores: registers, program counter, stack pointer, and flushes TLB entries. Direct cost is ~1-10μs, but the indirect cost is much higher:
- TLB flush: virtual→physical cache invalidated, causes TLB misses
- Cache pollution: new process fills L1/L2 cache, evicting the old process’s data
- Pipeline flush: CPU speculative execution state is lost
Rule of thumb: minimize context switches for throughput-sensitive workloads. Use thread pinning (taskset, pthread_setaffinity_np) for latency-sensitive code.
Linux CFS (Completely Fair Scheduler)
Linux’s default scheduler since 2.6.23. Each task tracks vruntime — how much CPU time it has received, weighted by priority.
vruntime += actual_runtime × (default_weight / task_weight)
- Tasks are stored in a red-black tree sorted by
vruntime - Scheduler always picks the leftmost node (smallest
vruntime) → O(log n) - Higher nice value → higher weight divisor →
vruntimegrows faster → gets less CPU
nice -n 10 ./background_job # lower priority (nice 10)
nice -n -5 ./latency_sensitive # higher priority (needs root for negative nice)
renice -n 5 -p PID # change running process priorityPriority Inversion
A high-priority task blocks on a lock held by a low-priority task, while a medium-priority task runs freely. The high-priority task is effectively running at low priority.
High (blocked on lock) ──────────────── BLOCKED
Medium (running) ████████████████ RUNNING
Low (holding lock) ──────────────── can't run to release lock!
Solutions: priority inheritance (temporarily boost lock-holder to waiter’s priority) or priority ceiling (lock has a fixed ceiling priority).
Real-Time Scheduling
Linux supports POSIX real-time classes above CFS:
| Class | Policy | Use Case |
|---|---|---|
SCHED_FIFO | Run until blocked or preempted by higher priority | Hard real-time |
SCHED_RR | FIFO + time slice rotation within same priority | Soft real-time |
SCHED_DEADLINE | Earliest deadline first | Periodic real-time tasks |
SCHED_OTHER | CFS (normal) | Everything else |
chrt -f 50 ./realtime_app # SCHED_FIFO at priority 50
chrt -r 30 ./soft_realtime # SCHED_RR at priority 30Related
- Processes and Threads — what gets scheduled
- Concurrency and Synchronization — priority inversion with locks
- RTOS Fundamentals — scheduling in embedded real-time systems