A barrel processor is a CPU that switches between threads of execution on every cycle. This CPU design technique is also known as "interleaved" or "fine-grained" temporal multithreading. Unlike simultaneous multithreading in modern superscalar architectures, it generally does not allow execution of multiple instructions in one cycle.
Like preemptive multitasking, each thread of execution is assigned its own program counter and other hardware registers (each thread's architectural state). A barrel processor can guarantee that each thread will execute one instruction every n cycles, unlike a preemptive multitasking machine, that typically runs one thread of execution for tens of millions of cycles, while all other threads wait their turn.
A technique called C-slowing can automatically generate a corresponding barrel processor design from a single-tasking processor design. An n-way barrel processor generated this way acts much like n separate multiprocessing copies of the original single-tasking processor, each one running at roughly 1/n the original speed.
One of the earliest examples of a barrel processor was the I/O processing system in the CDC 6000 series supercomputers. These executed one instruction (or a portion of an instruction) from each of 10 different virtual processors (called peripheral processors) before returning to the first processor. From CDC 6000 series we read that "The peripheral processors are collectively implemented as a barrel processor. Each executes routines independently of the others. They are a loose predecessor of bus mastering or direct memory access."
One motivation for barrel processors was to reduce hardware costs. In the case of the CDC 6x00 PPUs, the digital logic of the processor was much faster than the core memory, so rather than having ten separate processors, there are ten separate core memory units for the PPUs, but they all share the single set of processor logic.
Another example is the Honeywell 800, which had 8 groups of registers, allowing up to 8 concurrent programs.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Multiprocessors are now the defacto building blocks for all computer systems. This course will build upon the basic concepts offered in Computer Architecture I to cover the architecture and organizati
Explores cache block placement, replication policies, and management strategies in Reactive NUCA for multi-core caches.
A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate.
The capability to process high-resolution videos in real-time is becoming more important in a wide variety of applications such as autonomous vehicles, virtual reality or intelligent surveillance systems. The high-accuracy and complex video processing algo ...
EPFL2019
, ,
This article presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple yet important task: Given the performance of an application on a small machine with a handful of cores, ES ...
Database systems access memory either sequentially or randomly. Contrary to sequential access and despite the extensive efforts of
computer architects, compiler writers, and system builders, random access to data larger than the processor cache has been s ...