In computer engineering, directory-based cache coherence is a type of cache coherence mechanism, where directories are used to manage caches in place of bus snooping. Bus snooping methods scale poorly due to the use of broadcasting. These methods can be used to target both performance and scalability of directory systems.
In the full bit vector format, for each possible cache line in memory, a bit is used to track whether every individual processor has that line stored in its cache. The full bit vector format is the simplest structure to implement, but the least scalable. The SGI Origin 2000 uses a combination of full bit vector and coarse bit vector depending on the number of processors.
Each directory entry must have 1 bit stored per processor per cache line, along with bits for tracking the state of the directory. This leads to the total size required being (number of processors)×number of cache lines, having a storage overhead ratio of (number of processors)/(cache block size×8).
It can be observed that directory overhead scales linearly with the number of processors. While this may be fine for a small number of processors, when implemented in large systems the size requirements for the directory becomes excessive. For example, with a block size of 32 bytes and 1024 processors, the storage overhead ratio becomes 1024/(32×8) = 400%.
The coarse bit vector format has a similar structure to the full bit vector format, though rather than tracking one bit per processor for every cache line, the directory groups several processors into nodes, storing whether a cache line is stored in a node rather than a line. This improves size requirements at the expense of bus traffic saving (processors per node)×(total lines) bits of space. Thus the ratio overhead is the same, just replacing number of processors with number of processor groups. When a bus request is made for a cache line that one processor in the group has, the directory broadcasts the signal into every processor in the node rather than just the caches that contain it, leading to unnecessary traffic to nodes that do not have the data cached.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Multiprocessors are a core component in all types of computing infrastructure, from phones to datacenters. This course will build on the prerequisites of processor design and concurrency to introduce
Multiprocessors are now the defacto building blocks for all computer systems. This course will build upon the basic concepts offered in Computer Architecture I to cover the architecture and organizati
Un cache de processeur est une antémémoire matérielle utilisée par l'unité centrale de traitement (CPU) d'un ordinateur pour réduire le coût moyen (temps ou énergie) de l’accès aux données de la mémoire principale. Un cache de processeur est une mémoire plus petite et plus rapide, située au plus près d'une unité centrale de traitement (ou d'un cœur de microprocesseur), qui stocke des copies des données à partir d'emplacements de la mémoire principale qui sont fréquemment utilisés avant leurs transmissions aux registres du processeur.
Queries to detect isomorphic subgraphs are important in graph-based data management. While the problem of subgraph isomorphism search has received considerable attention for the static setting of a single query, or a batch thereof, existing approaches do n ...
ASSOC COMPUTING MACHINERY2021
Explore la cohérence du cache dans les systèmes multiprocesseurs, y compris les protocoles et les défis.
Couvre l'évolution et les défis des multiprocesseurs, en mettant l'accent sur l'efficacité énergétique, la programmation parallèle, la cohérence du cache et le rôle des GPU.
Couvre la cohérence du cache, partage des données dans les caches multiprocesseurs, les répertoires et les protocoles de cohérence.
Effective caching is crucial for performance of modern-day computing systems. A key optimization problem arising in caching – which item to evict to make room for a new item – cannot be optimally solved without knowing the future. There are many classical ...
Disaggregated memory can address resource provisioning inefficiencies in current datacenters. Multiple software runtimes for disaggregated memory have been proposed in an attempt to make disaggregated memory practical. These systems rely on the virtual mem ...