Francesco Cremonesi, Gerhard Wellein
Big science initiatives are trying to reconstruct and model the brain by attempting to simulate brain tissue at larger scales and with increasingly more biological detail than previously thought possible. The exponential growth of parallel computer performance has been supporting these developments, and at the same time maintainers of neuroscientific simulation code have strived to optimally and efficiently exploit new hardware features. Current state-of-the-art software for the simulation of biological networks has so far been developed using performance engineering practices, but a thorough analysis and modeling of the computational and performance characteristics, especially in the case of morphologically detailed neuron simulations, is lacking. Other computational sciences have successfully used analytic performance engineering, which is based on "white-box," that is, first-principles performance models, to gain insight on the computational properties of simulation kernels, aid developers in performance optimizations and eventually drive codesign efforts, but to our knowledge a model-based performance analysis of neuron simulations has not yet been conducted. We present a detailed study of the shared-memory performance of morphologically detailed neuron simulations based on the Execution-Cache-Memory performance model. We demonstrate that this model can deliver accurate predictions of the runtime of almost all the kernels that constitute the neuron models under investigation. The gained insight is used to identify the main governing mechanisms underlying performance bottlenecks in the simulation. The implications of this analysis on the optimization of neural simulation software and eventually codesign of future hardware architectures are discussed. In this sense, our work represents a valuable conceptual and quantitative contribution to understanding the performance properties of biological networks simulations.