Building Chips Faster: Hardware-Compiler Co-Design for Accelerated RTL Simulation

Sahand Kashani
2023
EPFL thesis

Abstract

The demise of Moore's Law and Dennard scaling has resulted in diminishing performance gains for general-purpose processors, and so has prompted a surge in academic and commercial interest for hardware accelerators.Specialized hardware has already redefined the computing landscape by enabling the emergence of disruptive, large-scale applications that would otherwise not have been possible with CPUs alone.\emph{RTL simulators} play a key role in enabling the accelerated computing revolution:they are to hardware engineers what debuggers and runtime systems are to software engineers.Without RTL simulators, no hardware accelerator could be functionally designed.As accelerators increase in size and complexity, the hardware design industry will increasingly need faster RTL simulators to permit chip design in reasonable time.Since the advent of multicore computers, parallelism is the preferred approach to improve software performance.RTL simulation seems to offer many opportunities to follow such a path: accelerators are written in hardware description languages that contain parallel constructs for describing independent hardware components that run in parallel and synchronize only at clock edges.Unfortunately, there is a mismatch between RTL simulation and today's multicore systems: tasks in RTL simulation tend to be very small in size, resulting in fine-grain parallelism.This fine-grain parallelism contrasts with coarse-grain parallel workloads for which modern multicore systems are built, which leads to simulator designs that can achieve only weak parallel performance scaling.This thesis argues that we need computing architectures that can achieve \emph{strong scaling} to truly speed up RTL simulation through parallelism.A strong scaling architecture is one that can make effective use of additional cores without having to increase the total workload size.This enables even small or moderate size designs to exploit parallelism to run quickly.This thesis contributes Manticore, a co-designed manycore architecture and compiler for RTL simulation that achieves strong parallel performance scaling.Manticore combines a bulk-synchronous parallel execution model with static scheduling to eliminate the runtime overheads of synchronization among hundreds of cores, simplify core design, and significantly increase the parallelism possible on a single chip.Our modest FPGA prototype of Manticore greatly increases parallel RTL simulation rate compared to a state-of-the-art software simulator running on top-of-the-line desktop and server x86 processors.The ideas underlying Manticore's design present a first step towards fast, scale-out RTL simulation.

Official source

https://infoscience.epfl.ch/record/304487?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Building Chips Faster: Hardware-Compiler Co-Design for Accelerated RTL Simulation

Graph Chatbot

Chat with Graph Search

EdgeAI-Aware Design of In-Memory Computing Architectures

Contemporary Logic Synthesis: with an Application to AQFP Circuit Optimization

DBFS: Dynamic Bitwidth-Frequency Scaling for Efficient Software-defined SIMD

EdgeAI-Aware Design of In-Memory Computing Architectures

DBFS: Dynamic Bitwidth-Frequency Scaling for Efficient Software-defined SIMD

Contemporary Logic Synthesis: with an Application to AQFP Circuit Optimization