Concept

SIMT

Résumé
Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading. It is different from SPMD in that all instructions in all "threads" are executed in lock-step. The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs. The processors, say a number p of them, seem to execute many more than p tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to SIMD lanes. The simplest way to understand SIMT is to imagine a multi-core system, where each core has its own register file, its own ALUs (both SIMD and Scalar) and its own data cache, but that unlike a standard multi-core system which has multiple independent instruction caches and decoders, as well as multiple independent Program Counter registers, the instructions are synchronously broadcast to all SIMT cores from a single unit with a single instruction cache and a single instruction decoder which reads instructions using a single Program Counter. The key difference between SIMT and SIMD lanes is that each of the SIMT cores may have a completely different Stack Pointer (and thus perform computations on completely different data sets), whereas SIMD lanes are simply part of an ALU that knows nothing about memory per se. SIMT was introduced by Nvidia in the Tesla GPU microarchitecture with the G80 chip. ATI Technologies, now AMD, released a competing product slightly later on May 14, 2007, the TeraScale 1-based "R600" GPU chip. As access time of all the widespread RAM types (e.g. DDR SDRAM, GDDR SDRAM, XDR DRAM, etc.) is still relatively high, engineers came up with the idea to hide the latency that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs.
À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Cours associés (3)
CS-307: Introduction to multiprocessor architecture
Multiprocessors are a core component in all types of computing infrastructure, from phones to datacenters. This course will build on the prerequisites of processor design and concurrency to introduce
CS-471: Advanced multiprocessor architecture
Multiprocessors are now the defacto building blocks for all computer systems. This course will build upon the basic concepts offered in Computer Architecture I to cover the architecture and organizati
PHYS-100: Advanced physics I (mechanics)
La Physique Générale I (avancée) couvre la mécanique du point et du solide indéformable. Apprendre la mécanique, c'est apprendre à mettre sous forme mathématique un phénomène physique, en modélisant l