Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer

Michael Lee Hines, Sameer Kumar
2011
Article

Résumé

For neural network simulations on parallel machines, Interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10k connections per cell, i.e., on the order of 4.10(10) connections (K is 1024, M is 10242, and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.

Source officielle

https://infoscience.epfl.ch/record/178512?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer

Graph Chatbot

Chattez avec Graph Search

Task-driven neural network models predict neural dynamics of proprioception: Neural network model weights

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts

Task-driven neural network models predict neural dynamics of proprioception: Neural network model weights

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts