Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Machine learning algorithms such as Convolutional Neural Networks (CNNs) are characterized by high robustness towards quantization, supporting small-bitwidth fixed-point arithmetic at inference time with little to no degradation in accuracy. In turn, small-bitwidth arithmetic can avoid using area-and-energy-hungry combinational multipliers, employing instead iterative shift-add operations. Crucially, this approach paves the way for very efficient data-level-parallel computing architectures, which allow fine-grained control of the operand bitwidth at run-time to realize heterogeneous quantization schemes. For the first time, we herein analyze a novel scaling opportunity offered by shift-add architectures, which emerges from the relation between the bitwidth of operands and their effective critical path timing at run-time. Employing post-layout simulations, we show that significant operating frequency increases can be achieved (by as much as 4.13× in our target architecture) at run-time, with respect to the nominal design-time frequency constraint. Critically, by exploiting the ensuing Dynamic Bitwidth-Frequency Scaling (DBFS), speedups of up to 73% are achieved in our experiments when executing quantized CNNS, with respect to an alternative solution based on a combinational multiplier-adder that occupies 2.35× more area and requires 51% more energy.
Joshua Alexander Harrison Klein
Aurélien François Gilbert Bloch