Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Exposing parallelism in scientific applications has become a core requirement for efficiently running on modern distributed multicore SIMD compute architectures. The granularity of parallelism that can be attained is a key determinant for the achievable acceleration and time to solution. Motivated by a scientific use case that requires the simulation of long spans of time — the study of plasticity and learning in detailed models of brain tissue — we present a strategy that exposes and exploits multicore and SIMD micro-parallelism from unrolling flow dependencies and concurrent outputs in a large system of coupled ordinary differential equations (ODEs). An implementation of a parallel simulator is presented, running on the HPX runtime system for the ParalleX execution model, providing dynamic task-scheduling and asynchronous execution. The implementation was tested on different architectures using a previously published brain tissue model. Benchmark of single neurons on a single compute node present a speed-up of circa 4-7x when compared with the state of the art Single Instruction Multiple Data (SIMD) implementation and 13-40x over its Single Instruction Single Data (SISD) counterpart. Large scale benchmarks suggest almost ideal strong scaling and a speed-up of 2-8x on a distributed architecture of 128 Cray X6 compute nodes.
Aurélien François Gilbert Bloch