Publication

How to Stop Under-Utilization and Love Multicores

Abstract

Hardware trends oblige software to overcome three major challenges against systems scalability: (1) taking advantage of the implicit/vertical parallelism within a core that is enabled through the aggressive micro-architectural features, (2) exploiting the explicit/horizontal parallelism provided by multicores, and (3) achieving predictively efficient execution despite the variability in communication latencies among cores on multisocket multicores. In this three hour tutorial, we shed light on the above three challenges and survey recent proposals to alleviate them. The first part of the tutorial describes the instruction- and data-level parallelism opportunities in a core coming from the hardware and software side. In addition, it examines the sources of under-utilization in a modern processor and presents insights and hardware/software techniques to better exploit the microarchitectural resources of a processor by improving cache locality at the right level of the memory hierarchy. The second part focuses on the scalability bottlenecks of database applications at the level of multicore and multisocket multicore architectures. It first presents a systematic way of eliminating such bottlenecks in online transaction processing workloads, which is based on minimizing unbounded communication, and shows several techniques that minimize bottlenecks in major components of database management systems. Then, it demonstrates the data and work sharing opportunities for analytical workloads, and reviews advanced scheduling mechanisms that are aware of nonuniform memory accesses and alleviate bandwidth saturation.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (36)
Multi-core processor
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such as add, move data, and branch) but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques.
Database
In computing, a database is an organized collection of data (also known as a data store) stored and accessed electronically through the use of a database management system. Small databases can be stored on a , while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.
Cache hierarchy
Cache hierarchy, or multi-level caches, refers to a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access by central processing unit (CPU) cores. Cache hierarchy is a form and part of memory hierarchy and can be considered a form of tiered storage. This design was intended to allow CPU cores to process faster despite the memory latency of main memory access.
Show more
Related publications (126)

Determining an optimum quantity of interleaved instruction streams of defined coroutines

Anastasia Ailamaki, Georgios Psaropoulos

A method for optimizing memory access for database opera­tions is provided. The method may include performing a first database operation by at least executing a first instruction stream associated with the first database operation. The first database opera ...
2023

TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers

David Atienza Alonso, Giovanni Ansaloni, Alireza Amirshahi, Joshua Alexander Harrison Klein

Transformer models have achieved impressive results in various AI scenarios, ranging from vision to natural language processing. However, their computational complexity and their vast number of parameters hinder their implementations on resource-constraine ...
2023

Analytical Engines With Context-Rich Processing: Towards Efficient Next-Generation Analytics

Anastasia Ailamaki, Viktor Sanca

As modern data pipelines continue to collect, produce, and store a variety of data formats, extracting and combining value from traditional and context-rich sources such as strings, text, video, audio, and logs becomes a manual process where such formats a ...
2023
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.