ColTraIn: Co-located DNN training and inference

Mario Paulo Drumond Lages De Oliveira
2020
Thèse EPFL

Résumé

Deep neural network inference accelerators are deployed at scale to accommodate online services, but face low average load because of service demand variability, leading to poor resource utilization. Unfortunately, reclaiming inference idle cycles is difficult, as no other workload can execute on such custom accelerators. DNN training services offer opportunities to reclaim inference accelerator idle cycles. However, the inference services' tight latency constraints and the training algorithms' dependence on floating-point arithmetic limit the opportunities for piggybacking training services on inference accelerators.

In this thesis, we tackle the challenges that prevent inference DNN accelerators from exposing their idle cycles to training services. We first develop an efficient numeric representation that enables DNN training with accuracy similar to single-precision floating point and energy efficiency similar to 8-bit fixed point. Then, we explore the inference accelerator design space to show that, unlike in current latency-optimal platforms, relaxing latency constraints with ALU arrays that are batching-optimized achieves near-optimal throughput for a given area and power envelope. High throughput inference accelerators maximize the opportunities to piggyback training. Finally, we present Equinox, a family of inference accelerators designed to piggyback training. Equinox employs a uniform encoding and a priority hardware scheduler that processes training requests during inference idle cycles without affecting inference tail latency. Overall, we show that exposing accelerator idle cycles to training services uncovers significant computing power for training services with a small overhead for inference accelerators, improving overall datacenter efficiency.

Source officielle

https://infoscience.epfl.ch/record/280118?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

ColTraIn: Co-located DNN training and inference

Graph Chatbot

Chattez avec Graph Search

Towards General-Purpose Decentralized Computing with Permissionless Extensibility

Functional-Basis Analysis of Non-Stationary Signals in Modern Power Grids: Theory and Implementation in Embedded Systems

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Functional-Basis Analysis of Non-Stationary Signals in Modern Power Grids: Theory and Implementation in Embedded Systems

Towards General-Purpose Decentralized Computing with Permissionless Extensibility