Temporal multithreading is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other being simultaneous multithreading. The distinguishing difference between the two forms is the maximum number of concurrent threads that can execute in any given pipeline stage in a given cycle. In temporal multithreading the number is one, while in simultaneous multithreading the number is greater than one. Some authors use the term super-threading synonymously.
There are many possible variations of temporal multithreading, but most can be classified into two sub-forms:
Coarse-grained The main processor pipeline contains only one thread at a time. The processor must effectively perform a rapid context switch before executing a different thread. This fast context switch is sometimes referred to as a thread switch. There may or may not be additional penalty cycles when switching.
There are many possible variations of coarse-grained temporal multithreading, mainly concerning the algorithm that determines when thread switching occurs. This algorithm may be based on one or more of many different factors, including cycle counts, cache misses, and fairness.
Fine-grained (or interleaved) The main processor pipeline may contain multiple threads, with context switches effectively occurring between pipe stages (e.g., in the barrel processor). This form of multithreading can be more expensive than the coarse-grained forms because execution resources that span multiple pipe stages may have to deal with multiple threads. Also contributing to cost is the fact that this design cannot be optimized around the concept of a "background" thread — any of the concurrent threads implemented by the hardware might require its state to be read or written on any cycle.
In any of its forms, temporal multithreading is similar in many ways to simultaneous multithreading. As in the simultaneous process, the hardware must store a complete set of states per concurrent thread implemented.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate.
Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures. The term multithreading is ambiguous, because not only can multiple threads be executed simultaneously on one CPU core, but also multiple tasks (with different page tables, different task state segments, different protection rings, different I/O permissions, etc.
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems. In Modern Operating Systems, Tanenbaum shows that many distinct models of process organization are possible. In many cases, a thread is a component of a process.
Beginning with a basic pipeline processor, student will learn to implement intriguing architectural techinques through a series of labs. The class will emphasize the implementation, debugging, and ana
D'une part, le cours aborde: (1) la notion d'algorithme et de représentation de l'information, (2) l'échantillonnage d'un signal et la compression de données et (3) des aspects
liés aux systèmes: ordi
The course studies techniques to exploit Instruction-Level Parallelism (ILP) statically and dynamically. It also addresses some aspects of the design of domain-specific accelerators. Finally, it explo
Related lectures (12)
Driven by the demand for real-time processing and the need to minimize latency in AI algorithms, edge computing has experienced remarkable progress. Decision-making AI applications stand out for their heavy reliance on data-centric operations, predominantl ...
EPFL2024
, ,
String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise that no string sorte ...
IEEE COMPUTER SOC2020
, ,
String sorting is a fundamental kernel of string matching and database index construction; yet, it has not been studied as extensively as fixed-length keys sorting. Because processing variable-length keys in hardware is challenging, it is no surprise that ...
Explores parallelism in programming, emphasizing trade-offs between programmability and performance, and introduces shared memory parallel programming using OpenMP.