Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks

Modern asynchronous runtime systems allow the re-thinking of large-scale scientific applications. With the example of a simulator of morphologically detailed neural networks, we show how detaching from the commonly used bulk-synchronous parallel (BSP) execution allows for the increase of prefetching capabilities, better cache locality, and a overlap of computation and communication, consequently leading to a lower time to solution. Our strategy removes the operation of collective synchronization of ODEs' coupling information, and takes advantage of the pairwise time dependency between equations, leading to a fully-asynchronous exhaustive yet not speculative stepping model. Combined with fully linear data structures, communication reduce at compute node level, and an earliest equation steps first scheduler, we perform an acceleration at the cache level that reduces communication and time to solution by maximizing the number of timesteps taken per neuron at each iteration.

Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks

Graph Chatbot

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

Task-driven neural network models predict neural dynamics of proprioception: Neural network model weights

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

Task-driven neural network models predict neural dynamics of proprioception: Neural network model weights