Publication

Training DNNs with Hybrid Block Floating Point

Babak Falsafi, Martin Jaggi, Tao Lin, Mario Paulo Drumond Lages De Oliveira
2018
Conference paper

Abstract

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full-precision floating-point arithmetic to maximize performance per area. Ongoing research efforts seek to further increase that performance density by replacing floating-point with fixedpoint arithmetic. However, a significant roadblock for these attempts has been fixed point's narrow dynamic range, which is insufficient for DNN training convergence. We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic. Unfortunately, BFP alone introduces several limitations that preclude its direct applicability. In this work, we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point. HBFP delivers the best of both worlds: the high accuracy of floating point at the superior hardware density of fixed point. For a wide variety of models, we show that HBFP matches floating point's accuracy while enabling hardware implementations that deliver up to 8.5 x higher throughput.

Official source

https://infoscience.epfl.ch/record/266344?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Training DNNs with Hybrid Block Floating Point

Graph Chatbot

Chat with Graph Search

Towards General-Purpose Decentralized Computing with Permissionless Extensibility

A 16-bit Floating-Point Near-SRAM Architecture for Low-power Sparse Matrix-Vector Multiplication

Bootstrapping traceless symmetric O(N) scalars

Towards General-Purpose Decentralized Computing with Permissionless Extensibility

A 16-bit Floating-Point Near-SRAM Architecture for Low-power Sparse Matrix-Vector Multiplication

Bootstrapping traceless symmetric O(N) scalars