Overview

A cache in computing is a small, fast storage layer that retains copies of data originally kept in slower, larger storage. By keeping frequently accessed or recently computed items closer to the processor or application, a cache reduces the average time and cost to obtain information. A cache can hold results of computations, disk blocks, web responses, translated memory pages or other data that are expensive or slow to fetch from their authoritative source. When a requested item is present in the cache this is called a cache hit; when it is not, the system must fetch the item from the slower source and often place a copy into the cache, which is called a cache miss.

Key characteristics and components

Caches are defined by a few basic properties: capacity (size), granularity (the size of the stored unit, such as bytes or blocks), lookup method (how the cache locates an item), replacement policy (how it chooses what to evict when full), and write policy (how and when modified data are propagated back to the original store). Typical caches operate on the principle of locality: temporal locality (recently used items are likely to be used again) and spatial locality (items near recently used data are likely to be used soon). These patterns make caching effective in many layers of computing.

Common types of caches

  • CPU caches: Small, very fast caches located on or near the processor (L1, L2, L3 levels) that store instructions and data to reduce main memory accesses.
  • Disk and file-system caches: Memory buffers that cache disk blocks or file contents so that repeated reads do not require slow disk access.
  • Database caches: In-memory structures that keep frequently queried rows, pages or query results to accelerate database operations.
  • Web caches: Proxies, browsers and content delivery networks that store HTTP responses to reduce latency and bandwidth usage for subsequent clients.
  • Application-level caches: Framework or library-managed caches (in-memory key-value stores, memoization of function results) used to avoid recomputation or remote calls.

Organization and policies

Lookup structures vary from simple hash tables to multi-way associative or set-associative organizations in hardware caches. Replacement policies decide which entry to evict when space is needed; common strategies include least recently used (LRU), first-in-first-out (FIFO), least frequently used (LFU) and randomized eviction. Write policies determine when changes in the cache are written back to the backing store: write-through updates the backing store immediately, while write-back defers the update until eviction. Systems also implement coherence and invalidation mechanisms when multiple caches can hold the same underlying data.

Performance and metrics

Cache effectiveness is measured by hit rate (fraction of accesses served by the cache), miss rate, and average access latency. Misses are often classified as compulsory (first-time access), capacity (cache too small), or conflict (mapping constraints cause eviction despite available space). Designers balance cache size, associativity, and lookup latency: larger caches can hold more items but may cost more or take longer to search, while smaller caches are faster but miss more often. Simulation and profiling are common ways to evaluate caching strategies for specific workloads.

History, rationale and uses

The idea of caching arises from the need to bridge performance gaps between fast and slow layers of a system. Memory hierarchies and caches grew in prominence as processor speeds outpaced memory and storage access times. Today, caching underpins many optimizations: speeding program execution, reducing network traffic, enabling scalable web services, and improving responsiveness in interactive applications. Strategies such as memoization cache computed function outputs, while CDNs cache static web assets close to users to reduce latency.

Distinctions and notable considerations

Caches differ from buffers: a buffer temporarily holds data while it is being moved or processed and is typically managed explicitly by an application, whereas a cache transparently holds copies to speed repeated access. Caching can introduce complexity: stale data, consistency issues, and added memory pressure. Systems use time-to-live (TTL), validation headers, explicit invalidation or coherence protocols to reduce inconsistencies. Security and privacy also matter: caches can leak sensitive information if not carefully controlled, and side-channel attacks sometimes exploit cache behavior.

Further reading and practical notes

Designers choose a caching approach by profiling typical access patterns and selecting size, granularity and policies that match locality characteristics. For general background on caching principles see introductory materials. For distinctions between buffers and caches consult application design notes. To explore locality of reference and its impact on cache effectiveness, follow locality resources. For advanced techniques, such as coherence protocols, eviction algorithms and cache-aware algorithms, see more specialized references at technical literature.

Because caches are ubiquitous across hardware and software, understanding their trade-offs is essential for system performance tuning, capacity planning and correct program behavior.