Concept

Bus snooping

Related publications (39)

Linebacker: Preserving Victim Cache Lines in Idle Register Files of GPUs

Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp thr ...

ASSOC COMPUTING MACHINERY2019

Abstracting Multi-Core Topologies with MCTOP

Rachid Guerraoui, Vasileios Trigonakis, Georgios Chatzopoulos

Portability and efficiency are usually antagonists in multi-core computing. In order to develop efficient code, one needs to take into account the topology of the target multi-cores (e.g., for locality). This clearly hampers code portability. In this paper ...

ACM Press2017

Efficient Communication and Synchronization on Manycore Processors

Darko Petrovic

The increased number of cores integrated on a chip has brought about a number of challenges. Concerns about the scalability of cache coherence protocols have urged both researchers and practitioners to explore alternative programming models, where cache co ...

EPFL2015

Challenges of Memory Management on Modern NUMA Systems

Baptiste Joseph Eustache Lepers, Mohammad Dashti Rahmat Abadi

The latency of memory access times is hence non-uniform, because it depends on where the request originates and where it is destined to go. Such systems are referred to as nonuniform memory access (or NUMA). Current x86 NUMA systems are cache coherent (cal ...

Assoc Computing Machinery2015

On the Performance of Delegation over Cache-Coherent Shared Memory

André Schiper, Thomas Ropars, Darko Petrovic

Delegation is a thread synchronization technique where access to shared data is performed through a dedicated server thread. When a client thread requires shared data access, it makes a request to a server and waits for a response. This paper studies deleg ...

2015

ALLARM: Optimizing Sparse Directories for Thread-Local Data

Amitabha Roy

Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent o ...

2014

Leveraging Hardware Message Passing for Efficient Thread Synchronization

André Schiper, Thomas Ropars, Darko Petrovic

As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms for thread synchronization in concurrent programs is becoming a major concern. On cache-coherent shared-memory processors, synchronization efficiency is ult ...

2014

Leveraging Hardware Message Passing for Efficient Thread Synchronization

André Schiper, Thomas Ropars, Darko Petrovic

Assoc Computing Machinery2014

Designing ASCY-compliant Concurrent Search Data Structures

Rachid Guerraoui, Vasileios Trigonakis, Tudor Alexandru David, Tong Che

This report details the design of two new concurrent data structures, a hash table, called CLHT, and a binary search tree (BST), called BST-TK. Both designs are based on asynchronized concurrency (ASCY), a paradigm consisting of four complementary programm ...

2014

Multi-Grain Coherence Directory

Babak Falsafi

Conventional directory coherence operates at the finest granularity possible, that of a cache block. While simple, this organization fails to exploit frequent application behavior: at any given point in time, large, continuous chunks of memory are often ac ...

2013

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.