Publication

High-Throughput Maps on Message-Passing Manycore Architectures: Partitioning versus Replication

Related publications (34)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Leveraging Hardware Message Passing for Efficient Thread Synchronization

André Schiper, Thomas Ropars, Darko Petrovic

As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms for thread synchronization in concurrent programs is becoming a major concern. On cache-coherent shared-memory processors, synchronization efficiency is ult ...

Assoc Computing Machinery2014

Recent trends have led hardware manufacturers to place multiple processing cores on a single chip, making parallel programming the intended way of taking advantage of the increased processing power. However, bringing concurrency to average programmers is c ...

EPFL2014

Parallelization of the distinct lattice spring model

Jiannong Fang, Liang Sun, Jian Zhao, Gaofeng Zhao

The distinct lattice spring model (DLSM) is a newly developed numerical tool for modeling rock dynamics problems, i.e. dynamic failure and wave propagation. In this paper, parallelization of DLSM is presented. With the development of parallel computing tec ...

Wiley-Blackwell2013

, ,

This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket - uniform and ...

2013

High-Performance RMA-Based Broadcast on the Intel SCC

André Schiper, Thomas Ropars, Omid Shahmirzadi, Darko Petrovic

Many-core chips with more than 1000 cores are expected by the end of the decade. To overcome scalability issues related to cache coherence at such a scale, one of the main research directions is to leverage the message-passing programming model. The Intel ...

2012

Asynchronous Broadcast on the Intel SCC using Interrupts

André Schiper, Thomas Ropars, Omid Shahmirzadi, Darko Petrovic

This paper focuses on the design of an asynchronous broadcast primitive on the Intel SCC. Our solution is based on OC-Bcast, a state-of-the-art k-ary tree synchronous broadcast algorithm that leverages the parallelism provided by on-chip Remote Memory Acce ...

2012

On the Performance of Software Transactional Memory

Aleksandar Dragojevic

The recent proliferation of multi-core processors has moved concurrent programming into mainstream by forcing increasingly more programmers to write parallel code. Using traditional concurrency techniques, such as locking, is notoriously difficult and has ...

EPFL2012

Approximation Algorithms for Modern Multi-Processor Scheduling Problems

Martin Niemeier

This thesis is devoted to the design and analysis of algorithms for scheduling problems. These problems are ubiquitous in the modern world. Examples include the optimization of local transportation, managing access to concurrent resources like runways at a ...

EPFL2012

, ,

Linearizability is a key design methodology for reasoning about implementations of concurrent abstract data types in both shared memory and message passing systems. It provides the illusion that operations execute sequentially and fault-free, despite the a ...

Assoc Computing Machinery2012

, ,

2011