Concept

POWER9

Summary
POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in 12- and 24-core versions, for scale out and scale up applications, and possibly other variations, since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members. Summit, the fifth fastest supercomputer in the world (based on the Top500 list as of November 2022), is based on POWER9, while also using Nvidia Tesla GPUs as accelerators. The POWER9 core comes in two variants, a four-way multithreaded one called SMT4 and an eight-way one called SMT8. The SMT4- and SMT8-cores are similar, in that they consist of a number of so-called slices fed by common schedulers. A slice is a rudimentary 64-bit single-threaded processing core with load store unit (LSU), integer unit (ALU) and a vector scalar unit (VSU, doing SIMD and floating point). A super-slice is the combination of two slices. An SMT4-core consists of a 32 KiB L1 cache (1 KiB = 1024 bytes), a 32 KiB L1 data cache, an instruction fetch unit (IFU) and an instruction sequencing unit (ISU) which feeds two super-slices. An SMT8-core has two sets of L1 caches and, IFUs and ISUs to feed four super-slices. The result is that the 12-core and 24-core versions of POWER9 each consist of the same number of slices (96 each) and the same amount of L1 cache. A POWER9 core, whether SMT4 or SMT8, has a 12-stage pipeline (five stages shorter than its predecessor, the POWER8), but aims to retain the clock frequency of around 4 GHz. It will be the first to incorporate elements of the Power ISA v.3.0 that was released in December 2015, including the VSX-3 instructions. The POWER9 design is made to be modular and used in more processor variants and used for licensing, on a different fabrication process than IBM's. On chip are co-processors for compression and cryptography, as well as a large low-latency eDRAM L3 cache.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related lectures (4)
Emerging Memory I
Explores the landscape of big data, memory importance in online services, challenges faced by memory systems, emerging DRAM technologies, and storage-class memory.
Scientific Computing in Neuroscience
Explores scientific computing in neuroscience, emphasizing the simulation of neurons and networks using tools like NEURON, NEST, and BRIAN.
Multi-threaded Processors
Covers the basics of multi-threaded processors, including design, performance impact, and pipeline utilization.
Show more
Related publications (15)

Sub-kHz-Linewidth External-Cavity Laser (ECL) With Si3N4 Resonator Used as a Tunable Pump for a Kerr Frequency Comb

Tobias Kippenberg, Junqiu Liu

Combining optical gain in direct-bandgap III-V materials with tunable optical feedback offered by advanced photonic integrated circuits is key to chip-scale external-cavity lasers (ECL), offering wideband tunability along with low optical linewidths. Exter ...
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC2023

Hardware-Software Co-Design of an RPC Processor

Arash Pourhabibi Zarandi

The booming popularity of online services has led to a major evolution in the way these services are built and deployed. To cope with such online data-intensive services, service providers deploy several massive-scale datacenters, also referred to as wareh ...
EPFL2021

Modified Joint Channel-and-Data Estimation for One-Bit Massive MIMO

Mahdi Amiri, Saeed Saadatnejad, Mohammadhossein Bahari

Centralized and cloud computing-based network architectures are the promising tracks of future communication systems where a large scale compute power can be virtualized for various algorithms. These architectures rely on high-performance communication lin ...
IEEE2021
Show more
Related concepts (7)
Multithreading (computer architecture)
In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).
POWER8
POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for such availability of IBM's highest-end processors. Systems based on POWER8 became available from IBM in June 2014. Systems and POWER8 processor designs made by other OpenPOWER members were available in early 2015.
Multi-core processor
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such as add, move data, and branch) but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques.
Show more