This paper proposes SC++lite, a sequentially consistent system that relaxes memory order speculatively to bridge the performance gap among memory consistency models. Prior proposals to speculatively relax memory order require large custom on-chip storage to maintain a history of speculative processor and memory state while memory order is relaxed. SC++lite uses the memory hierarchy to store the speculative history, providing a scalable path for speculative SC systems across a wide range of applications and system latencies. We use cycle-accurate simulation of shared-memory multiprocessors to show that SC++lite can fully relax memory order while virtually obviating the need for custom on-chip storage. Moreover while demand for storage increases significantly with larger memory latencies, SC++lite's ability to relax memory order remains insensitive to memory latency. An SC++lite system can improve performance over a base SC system by 28% with only 2 KB of custom storage in a system with 16 processors. In contrast, speculative SC systems with custom storage require 51 KB of storage to improve performance by 31% over a base SC system
Aleksandra Radenovic, Andras Kis, Mukesh Kumar Tripathi, Zhenyu Wang, Asmund Kjellegaard Ottesen, Yanfei Zhao, Guilherme Migliato Marega, Hyungoo Ji
, , , ,