Decentralized in-order execution of a sequential task-based code for shared-memory architectures
Related publications (59)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The demise of Moore's Law and Dennard scaling has resulted in diminishing performance gains for general-purpose processors, and so has prompted a surge in academic and commercial interest for hardware accelerators.Specialized hardware has already redefined ...
Even if Dennard scaling came to an end fifteen years ago, Moore's law kept fueling an exponential growth in compute performance through increased parallelization. However, the performance of memory and, in particular, Dynamic Random Access Memory (DRAM), ...
In order to get a better understanding of the interaction of plasma microinstabilities and associated turbulence with specific modes, an antenna is implemented in the global gyrokinetic Particle-In-Cell (PIC) code ORB5.
It consists in applying an external ...
This thesis demonstrates that it is feasible for systems code to expose a latency interface that describes its latency and related side effects for all inputs, just like the code's semantic interface describes its functionality and related side effects.Sem ...
Performance and power constraints come together with Complementary Metal Oxide Semiconductor technology scaling in future Exascale systems. Technology scaling makes each individual transistor more prone to faults and, due to the exponential increase in the ...
Modern hardware is increasingly complex, requiring increasing effort to understand in order to carefully engineer systems for optimal performance and effective utilization. Moreover, established design principles and assumptions are not portable to modern ...
As hardware evolves, so do the needs of applications. To increase the performance of an application,
there exist two well-known approaches. These are scaling up an application, using a larger multi-core
platform, or scaling out, by distributing work to mul ...
A plethora of optimized mutex lock algorithms have been designed over the past 25 years to mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is currently no broad study of the behavior of these optimized lock alg ...
Verification and testing of hardware heavily relies on cycle-accurate simulation of RTL.As single-processor performance is growing only slowly, conventional, single-threaded RTL simulation is becoming impractical for increasingly complex chip designs and s ...
Quantum ESPRESSO is an open-source distribution of computer codes for quantum-mechanical materials modeling, based on density-functional theory, pseudopotentials, and plane waves, and renowned for its performance on a wide range of hardware architectures, ...