Near-optimal precharging in high-performance nanoscale CMOS caches

High-performance caches statically pull up the bit-lines in all cache subarrays to optimize cache access latency. Unfortunately, such architecture results in a significant waste of energy in nanoscale CMOS implementations due to high leakage and bitline discharge in the unaccessed subarrays. Recent research advocates bitline isolation to control precharging of individual subarrays using bitline precharge devices. In this paper, we carefully evaluate the energy and performance trade-offs of bitline isolation, and propose a technique to exploit nearly its full potential to eliminate discharge and reduce overall energy in level-one caches. Cycle-accurate and circuit simulation results of a wide-issue superscalar processor indicate that: 1) in future CMOS technologies (e.g., 70 nm and beyond), cache architectures that exploit bitline isolation can eliminate up to 90% of the bitline discharge; 2) on- demand precharging (i.e., decoding the address and subsequently precharging the accessed subarrays) is not viable in level-one caches because precharging increases the cache access latency; and 3) our proposal for gated precharging to exploit subarray reference locality and precharging only the recently accessed subarrays eliminates nearly all of bitline discharge in nanoscale CMOS caches with only a 1% of performance degradation

Near-optimal precharging in high-performance nanoscale CMOS caches

Graph Chatbot

Chat with Graph Search

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

Columnar Storage Optimization and Caching for Data Lakes

Micro-architectural Analysis of Database Workloads

Micro-architectural Analysis of Database Workloads

Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs

Columnar Storage Optimization and Caching for Data Lakes