Concept

Hopper (microarchitecture)

Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is parallel to Ada Lovelace. Named for computer scientist and United States Navy rear admiral Grace Hopper, the Hopper architecture was leaked in November 2019 and officially revealed in March 2022. It improves upon its predecessors, the Turing and Ampere microarchitectures, featuring a new streaming multiprocessor and a faster memory subsystem. The Nvidia Hopper H100 GPU is implemented using the TSMC 4N process with 80 billion transistors. It consists of up to 144 streaming multiprocessors. In SXM5, the Nvidia Hopper H100 offers better performance than PCIe. The streaming multiprocessors for Hopper improve upon the Turing and Ampere microarchitectures, although the maximum number of concurrent warps per SM remains the same between the Ampere and Hopper architectures, 64. The Hopper architecture provides a Tensor Memory Accelerator (TMA), which supports bidirectional asynchronous memory transfer between shared memory and global memory. Under TMA, applications may transfer up to 5D tensors. When writing from shared memory to global memory, element wise reduction and bitwise operators may be used, avoiding registers and SM instructions while enabling users to write warp specialized codes. TMA is exposed through cuda::memcpy_async When parallelizing applications, developers can use thread block clusters. Thread blocks may perform atomics in the shared memory of other thread blocks within its cluster, otherwise known as distributed shared memory. Distributed shared memory may be used by an SM simultaneously with L2 cache; when used to communicate data between SMs, this can utilize the combined bandwidth of distributed shared memory and L2. The maximum portable cluster size is 8, although the Nvidia Hopper H100 can support a cluster size of 16 by using the cudaFuncAttributeNonPortableClusterSizeAllowed function, potentially at the cost of reduced number of active blocks.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Cours associés (1)
CS-307: Introduction to multiprocessor architecture
Multiprocessors are a core component in all types of computing infrastructure, from phones to datacenters. This course will build on the prerequisites of processor design and concurrency to introduce
Séances de cours associées (17)
GPU : Architecture et programmation
Explore l'architecture des GPU, la programmation CUDA, le traitement d'image et leur importance dans l'informatique moderne, en mettant l'accent sur le démarrage précoce et l'exactitude de la programmation GPU.
Architecture multiprocesseur avancée
Couvre l'architecture multiprocesseur avancée, discutant de la logistique des cours, des composants, du classement et des tendances des systèmes informatiques modernes.
Modèles de cohérence de la mémoire: impact sur les performances et calcul parallèle
Couvre les modèles de cohérence de la mémoire, l'informatique parallèle, les sous-programmes atomiques, l'architecture GPU et le multithreading.
Afficher plus
Publications associées (34)

Massively parallel nodal discontinous Galerkin finite element method simulator for room acoustics

Jan Sickmann Hesthaven

We present a massively parallel and scalable nodal discontinuous Galerkin finite element method (DGFEM) solver for the time-domain linearized acoustic wave equations. The solver is implemented using the libParanumal finite element framework with extensions ...
London2023

Inter-actions parallel execution on GPU from high-level dataflow synthesis

Marco Mattavelli, Simone Casale Brunet, Aurélien François Gilbert Bloch

Recent GPU architectures make available numbers of parallel processing units that exceed by orders of magnitude the ones offered by CPU architectures. Whereas programs written using dataflow programming languages are well suited for programming heterogeneo ...
IEEE2022
Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.