Publication

HetCache: Synergising NVMe Storage and GPU acceleration for Memory-Efficient Analytics

Abstract

Accessing input data is a critical operation in data analytics: i) slow data access significantly degrades performance, and ii) storing everything in the fastest medium, i.e., memory, incurs high operational and hardware costs. Further, while GPUs offer increased analytical performance, equipping them with correspondingly fast memory requires even more expensive memory technologies than DRAM; making memory resources even more precious. Existing GPU-accelerated engines rely on CPU memory for bigger working sets, albeit at the expense of slower execution. Such a combination of both memory and compute disaggregation, however, invalidates the assumption of existing caching mechanisms: i) the processing tier is highly heterogeneous, ii) data access bandwidth depends on the access method and compute unit, iii) with NVMe arrays, persistent storage can approach in-memory bandwidth, and iv) all these relative quantities depend on the current query and data placement. Thus, existing caching approaches waste interconnect bandwidth, cache inefficiently, and overall result in suboptimal execution times. This work proposes HetCache, a storage engine for analytical workloads that optimizes the data access paths and tunes data placement by co-optimizing for the combinations of different memories, compute devices, and queries. Specifically, we present how the increasingly complex storage hierarchy impacts analytical query processing in GPU-NVMe-accelerated servers. HetCache accelerates analytics on CPU-GPU servers for larger-than-memory datasets through proportional and access-path-aware data placement. Our prototype implementation of HetCache demonstrates a 1.14x-1.78x speedup of GPU-only execution onNVMe resident data and achieves near in-system-memory performance for hybrid CPU-GPU execution, while substantially improving memory efficiency. Overall, HetCache turns the multi-memory-node nature of such heterogeneous servers from a burden into a performance booster.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (46)
Graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit initially designed to accelerate computer graphics and (either on a video card or embedded on the motherboards, mobile phones, personal computers, workstations, and game consoles). After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.
CPU cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1.
General-purpose computing on graphics processing units
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.
Show more
Related publications (80)

EdgeAI-Aware Design of In-Memory Computing Architectures

Marco Antonio Rios

Driven by the demand for real-time processing and the need to minimize latency in AI algorithms, edge computing has experienced remarkable progress. Decision-making AI applications stand out for their heavy reliance on data-centric operations, predominantl ...
EPFL2024

Rebooting Virtual Memory with Midgard

Siddharth Gupta

Virtual Memory (VM) is a critical programming abstraction that is widely used in various modern computing platforms. With the rise of datacenter computing and birth of planet-scale online services, the semantic and capacity requirements from memory have ev ...
EPFL2023

Chaosity: Understanding Contemporary NUMA-architectures

Anastasia Ailamaki, Viktor Sanca, Hamish Mcniece Hill Nicholson, Andreea Nica, Syed Mohammad Aunn Raza

Modern hardware is increasingly complex, requiring increasing effort to understand in order to carefully engineer systems for optimal performance and effective utilization. Moreover, established design principles and assumptions are not portable to modern ...
2023
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.