Overview
Process switching latency is the time required for an operating system to stop running one process and begin running another. It is the measurable delay associated with a context switch. While often short on modern hardware, this latency directly affects responsiveness, throughput, and real-time guarantees.
Key components
The total latency is made up of several steps that the kernel performs when switching tasks. Typical components include:
- Saving the outgoing process state (registers, program counter, flags).
- Loading the incoming process state and updating scheduler bookkeeping.
- Memory-management work: page table activation, TLB invalidation or reloads.
- Cache effects: instruction and data caches may suffer misses after a switch.
- Accounting and kernel mode transitions required to perform the switch.
Factors that influence latency
Latency depends on both software and hardware. Software factors include kernel design (monolithic vs. microkernel), context save/restore code paths, and whether threads or full processes are being switched. Hardware factors include CPU cache sizes, translation lookaside buffer (TLB) behavior, support for simultaneous multithreading, and the cost of privilege transitions.
Measurement and categories
Measurements distinguish voluntary switches (a task yields or blocks) from involuntary preemptions. Benchmarks often time repeated yields or use kernel tracing tools to estimate the cost per switch. In real systems, a single measured delay also hides secondary penalties such as increased cache miss rates for the resumed task.
Reducing latency
Common strategies to lower switching cost include:
- Using threads or lightweight tasks that share address space to avoid full MMU changes.
- Optimizing kernel paths and minimizing register/state save areas.
- Hardware features: larger caches, TLB tagging (PCID), and hardware thread support.
- Scheduler policies that reduce unnecessary preemption and keep related work on the same core.
Importance and distinctions
Process switching latency matters for desktop responsiveness, interactive services, and especially for real-time systems where worst-case delays are critical. It is distinct from interrupt latency (time to begin handling an interrupt) and from syscall overhead (cost of entering/exiting kernel mode). Understanding these differences helps engineers choose appropriate abstractions—processes, kernel threads, user-level threads, or event-driven designs—based on latency and isolation trade-offs.