Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay
Publications associées (41)
Graph Chatbot
Chattez avec Graph Search
Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.
AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.
Improving cache performance requires understanding cache behavior. However, measuring cache performance for one or two data input sets provides little insight into how cache behavior varies across all data input sets. This paper uses our recently published ...
This report details the design of two new concurrent data structures, a hash table, called CLHT, and a binary search tree (BST), called BST-TK. Both designs are based on asynchronized concurrency (ASCY), a paradigm consisting of four complementary programm ...
When it comes to performance, embedded systems share many problems with their higher-end counterparts. The growing gap between top processor frequency and memory access speed, the memory wall, is one such problem. Driven, in part, by low energy consumption ...
Improving cache performance requires understanding cache behavior. However, measuring cache performance for one or two data input sets provides little insight into how cache behavior varies across all data input sets and all cache configurations. This pape ...
Conventional directory coherence operates at the finest granularity possible, that of a cache block. While simple, this organization fails to exploit frequent application behavior: at any given point in time, large, continuous chunks of memory are often ac ...
Instruction-cache misses account for up to 40%; of execution time in online transaction processing (OLTP) database workloads. In contrast to data cache misses, instruction misses cannot be overlapped with out-of-order execution. Chip design limitations do ...
We revisit the idea of using small line buffers in-front of caches. We propose ReCast, a tiny tag set cache that filters a significant number of tag probes to the L2 tag array thus reducing power. The key contribution in ReCast is S-Shift, a simple indexin ...
Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last- level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ag ...
We introduce a novel multi-resource allocator to dynamically allocate resources for database servers running on virtual storage. Multi-resource allocation involves proportioning the database and storage server caches, and the storage bandwidth between appl ...
This paper introduces Way Stealing, a simple architectural modification to a cache-based processor to increase data bandwidth to and from application-specific Instruction Set Extensions (ISEs). Way Stealing provides more bandwidth to the ISE-logic than the ...
Ieee Service Center, 445 Hoes Lane, Po Box 1331, Piscataway, Nj 08855-1331 Usa2009