Cache (computing)

The title of this article is ambiguous. For other meanings, see Cache (disambiguation).

This article or subsequent section is not sufficiently supported by evidence (e.g., anecdotal evidence). Information without sufficient evidence may be removed in the near future. Please help Wikipedia by researching the information and adding good supporting evidence.

Far too few single references (currently 5 and 3 of them dictionary/definition references) for such a long text.

In information technology, cache ([kæʃ], [kaʃ]) refers to a fast buffer memory that helps to avoid (repeated) accesses to a slow background medium or costly recalculations. Data that has already been loaded or generated once remains in the cache so that it can be retrieved from it more quickly if needed later. Also, data that is likely to be needed soon can be retrieved from the background medium in advance and made available in the cache for the time being (read-ahead).

Caches can be hardware structures (for example, main memory chips) or software structures (for example, temporary files or reserved memory).

Cache is a loanword from the English language. It has its origin in the French cache, which actually means hiding place. The name clarifies the fact that the cache and its substitute function for the addressed background medium usually remain hidden from the user. If you use the background medium, you do not need to know the size or function of the cache, because the cache is not addressed directly. The user "addresses" the background medium, but instead the cache "responds" - in exactly the same way as the background medium would have responded, i.e. supplied data. Because of the invisibility of this intermediate unit, it is also referred to as transparency. Practically, it is a mirrored resource that is processed/used very quickly as a substitute for the original.

If other devices besides the one using the cache access the background medium, inconsistencies may occur. To be able to access an identical data image, it is necessary to transfer the changes of the cache to the background medium before access. Cache strategies such as write-through or write-back are practical here. In extreme cases, a complete "cache flush" must be performed.

In addition, the cache may need to be informed that data on the background medium has changed and its contents are no longer valid. If the cache logic does not ensure this, the disadvantage is that changes made in the background medium or in the calculation program in the meantime are not recognized. If changes are suspected, or to ensure that the current status is taken into account, the user must explicitly initiate a cache update.

Benefit

The goals of using a cache are to reduce the access time and/or to reduce the number of accesses to a slow background medium. This means in particular that the use of caches is only worthwhile where the access time also has a significant influence on the overall performance. While this is the case, for example, with the processor cache of most (scalar) microprocessors, it does not apply to vector computers, where the access time plays a subordinate role. This is why caches are usually not used there, because they are of little or no benefit.

Another important effect of using caches is the reduction of the necessary data transfer rate to the connection of the background medium (see e.g. memory hierarchy); the background medium can therefore be "connected more slowly", which can result in lower costs, for example. Because the majority of requests can often be answered by the cache ("cache hit", see below), the number of accesses and therefore the necessary transmission bandwidth decreases. For example, a modern microprocessor without a cache would be slowed down even with a very small access time of the main memory by the fact that not enough memory bandwidth is available, because the number of accesses to the main memory and thus the demand on the memory bandwidth would increase greatly due to the omission of the cache.

With CPUs, the use of caches can thus contribute to reducing the Von Neumann bottle neck of the Von Neumann architecture. The execution speed of programs can thus be increased enormously on average.

A disadvantage of caches is the poorly predictable time behavior, since the execution time of an access is not always constant due to cache misses. If the data is not in the cache, the accessor must wait until it has been loaded from the slow background medium. With processors, this often happens when accessing data that has not yet been used or when loading the next program instruction during (long) jumps.

Cache Hierarchy

Since it is technically complex and thus usually not economically sensible to build a cache that is both large and fast, one can use several caches - e.g. a small fast cache and a significantly larger but somewhat slower cache (which is still much faster than the background memory to be cached). This allows the competing goals of low access time and large cache size to be achieved together. This is important for the hit rate.

If several caches exist, they form a cache hierarchy, which is part of the memory hierarchy. The individual caches are numbered according to their hierarchy level, i.e. Level1 ‑to Leveln ‑or L1, L2, etc. for short. The lower the number, the closer the cache is to the fast "user"; the lowest number therefore indicates the cache with the fastest access time, which is searched first. If the L1 cache does not contain the required data, the (usually slightly slower, but larger) L2 cache is searched, and so on. This continues until the data is either found in one cache level (a "cache hit", see below) or all caches have been searched without success (a "cache miss", see below). In the latter case, the slow background memory must be accessed.

If a cache hit occurs, for example, in the L3 cache, the requested data is delivered to the accessor and at the same time transferred to the L1 cache; for this, a cache line must give way there, which "sinks" into the L2 cache.

With an inclusive cache, each cache level is transparent in itself, i.e. a cache line that is in the L1 cache is also present in the L2 and L3 caches. If the cache line is "displaced" from the L1 cache (overwritten with data from another address), nothing else needs to be done - it is still present in the L2 cache (provided no write-back or similar is necessary).
In an exclusive cache, a cache line of an address exists only once in all cache levels. A cache line for address A in the L1 cache does not also exist in the L2 or L3 cache. If it is displaced from the L1 cache, it can either be discarded completely, or must be explicitly copied to the L2 cache. There, too, a (different) cache line is displaced to make room for the sinking one. This other cache line now sinks into the L3 cache, where a third cache line has to give way.

Exclusive cache hierarchies generate significantly more data traffic between the caches. In return, as many cache lines can be kept available as the sum of the L1, L2, and L3 cache size, whereas with the inclusive cache only the L3 cache size is decisive.

In the hardware area, modern CPUs in particular have two or three cache levels; other devices usually have only one cache level. In the software area, usually only one cache level is used, a prominent exception being web browsers, which use two levels (main memory and hard disk drive).

Questions and Answers

Q: What is caching?

A: Caching is a term used in computer science that refers to the practice of storing copies of data that is used often in order to access it faster than re-fetching or re-calculating the original data.

Q: How does caching work?

A: Caching works by using two kinds of storage media, one which is usually quite large but slow to access, and another which can be accessed much faster but generally smaller. The idea behind caching is to use the fast medium to store copies of data so that accessing the original data takes less time or is less expensive.

Q: What is a buffer?

A: A buffer is similar to a cache in that it stores copies of data for quicker access, however with a buffer, the client accessing the data knows there is a buffer and it's managed by an application whereas with a cache, clients need not be aware there's a cache.

Q: What does locality of reference mean?

A: Locality of reference means that when an application accesses certain blocks of structured data, they are also likely to access other blocks close to those originally accessed. This helps caches work well as they are typically small compared to all available data.

Q: Why do bigger caches take longer to lookup entries?

A: Bigger caches take longer because they contain more stored information and therefore require more time for lookups. They are also more expensive as they require more resources for storage.

Q: How can locality help make caches work better?

A: Locality helps make caches work better because when applications access certain blocks of structured data, they are likely also going to need other nearby blocks which can then be quickly retrieved from the cache instead of having to fetch them from elsewhere or recalculate them again.