Overview
Cache coherence refers to mechanisms that ensure multiple caches holding copies of the same memory location present a consistent and sensible view of data. In systems where several processors, cores, or devices maintain local caches of shared memory, writes performed by one participant must be reflected, or must render other cached copies invalid, so that subsequent reads do not return stale or conflicting values. Cache coherence is a specific instance of the broader concept of memory coherence and is distinct from memory consistency models, which govern the order of memory operations.
How cache coherence is achieved
Coherence is implemented by protocols that detect and resolve conflicts between cached copies. Two high‑level approaches dominate:
- Snooping (bus‑based): Caches monitor a shared communication medium for transactions that affect blocks they hold. When a write occurs, snooping controllers either invalidate other copies or broadcast the updated data.
- Directory‑based: A directory records which caches have a copy of each block. When a write or read miss occurs, the directory coordinates invalidations, updates, or data transfers, which scales better for large systems without a single broadcast medium.
Two basic policies determine what happens on a write:
- Write‑invalidate: Other caches’ copies are invalidated so the writing cache becomes the sole owner and can modify freely.
- Write‑update (write‑broadcast): Updated data is sent to other caches so they can update their copies immediately.
Practical coherence protocols extend these ideas with cache‑line states such as Modified, Exclusive, Shared, and Invalid (MESI) or variants that permit ownership and forwarding (MOESI, MESIF).
Typical problems and design trade‑offs
Without coherence, two caches can diverge: a processor may read stale data while another has performed a more recent write. Common issues include lost updates and stale reads. Implementing coherence introduces trade‑offs in latency, bandwidth, hardware complexity, and scalability. Broadcast‑based snooping is simple and fast on small multiprocessor buses but becomes unscalable as core counts rise. Directory schemes reduce broadcast traffic at the cost of additional metadata and more complex handling of directory state.
Notable phenomena and mitigations
False sharing is a recurring performance pitfall: unrelated variables placed on the same cache line cause unnecessary invalidations and transfers when updated by different processors. Designers mitigate false sharing with padding, alignment strategies, or by reorganizing data structures. Coherence traffic can also be reduced by software techniques such as minimizing shared writes, using read‑only data patterns, or employing message passing instead of shared memory.
History, scope and applications
Cache coherence became an important problem as multiprocessor systems and multicore CPUs gained local caches for each core. Modern desktop, server, and mobile processors typically implement hardware coherence within a chip to make shared‑memory programming simpler and safer. Beyond CPU caches, coherence-like concerns appear in distributed shared memory systems, some GPU designs, and in multi‑level cache hierarchies. For further background and technical details, see additional resources on cache coherence.
Distinctions and why it matters
Cache coherence ensures that all observers see a consistent sequence of values for a single memory location, while memory consistency models specify ordering constraints across multiple locations and operations. Both aspects shape the correctness and performance of concurrent programs. Understanding coherence lets system architects and programmers reason about correctness, diagnose performance problems like excessive coherence traffic, and choose appropriate hardware or software patterns for scalable parallel systems.