Building Chips Faster: Hardware-Compiler Co-Design for Accelerated RTL Simulation

Sahand Kashani
2023
Thèse EPFL

Résumé

The demise of Moore's Law and Dennard scaling has resulted in diminishing performance gains for general-purpose processors, and so has prompted a surge in academic and commercial interest for hardware accelerators.Specialized hardware has already redefined the computing landscape by enabling the emergence of disruptive, large-scale applications that would otherwise not have been possible with CPUs alone.\emph{RTL simulators} play a key role in enabling the accelerated computing revolution:they are to hardware engineers what debuggers and runtime systems are to software engineers.Without RTL simulators, no hardware accelerator could be functionally designed.As accelerators increase in size and complexity, the hardware design industry will increasingly need faster RTL simulators to permit chip design in reasonable time.Since the advent of multicore computers, parallelism is the preferred approach to improve software performance.RTL simulation seems to offer many opportunities to follow such a path: accelerators are written in hardware description languages that contain parallel constructs for describing independent hardware components that run in parallel and synchronize only at clock edges.Unfortunately, there is a mismatch between RTL simulation and today's multicore systems: tasks in RTL simulation tend to be very small in size, resulting in fine-grain parallelism.This fine-grain parallelism contrasts with coarse-grain parallel workloads for which modern multicore systems are built, which leads to simulator designs that can achieve only weak parallel performance scaling.This thesis argues that we need computing architectures that can achieve \emph{strong scaling} to truly speed up RTL simulation through parallelism.A strong scaling architecture is one that can make effective use of additional cores without having to increase the total workload size.This enables even small or moderate size designs to exploit parallelism to run quickly.This thesis contributes Manticore, a co-designed manycore architecture and compiler for RTL simulation that achieves strong parallel performance scaling.Manticore combines a bulk-synchronous parallel execution model with static scheduling to eliminate the runtime overheads of synchronization among hundreds of cores, simplify core design, and significantly increase the parallelism possible on a single chip.Our modest FPGA prototype of Manticore greatly increases parallel RTL simulation rate compared to a state-of-the-art software simulator running on top-of-the-line desktop and server x86 processors.The ideas underlying Manticore's design present a first step towards fast, scale-out RTL simulation.

Source officielle

https://infoscience.epfl.ch/record/304487?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Building Chips Faster: Hardware-Compiler Co-Design for Accelerated RTL Simulation

Graph Chatbot

Chattez avec Graph Search

Highly Parallel RTL Simulation

DBFS: Dynamic Bitwidth-Frequency Scaling for Efficient Software-defined SIMD

X-Attack 2.0: The Risk of Power Wasters and Satisfiability Don’t-Care Hardware Trojans to Shared Cloud FPGAs

X-Attack 2.0: The Risk of Power Wasters and Satisfiability Don’t-Care Hardware Trojans to Shared Cloud FPGAs

DBFS: Dynamic Bitwidth-Frequency Scaling for Efficient Software-defined SIMD

Highly Parallel RTL Simulation