Stop Wasting my FLOPS: Improving the Efficiency of Deep Learning Models

Angelos Katharopoulos
2022
EPFL thesis

Abstract

Deep neural networks have completely revolutionized the field of machinelearning by achieving state-of-the-art results on various tasks ranging fromcomputer vision to protein folding. However, their application is hindered bytheir large computational and memory requirements. In this thesis, we proposemethods for improving the efficiency of deep neural networks.Firstly, we tackle the sample inefficiency of neural network training with animportance sampling algorithm suitable for deep neural networks. This algorithmallows us to focus computation on datapoints that are going to provide usefulgradients for training our models and ignore the ones that will have negligiblegradients. We show that our algorithm can improve the performance of variousneural networks when compared to uniform sampling under a fixed computationalbudget.Secondly, we design a model that is suitable for processing large input imageswith a fraction of the computational and memory requirements of traditionalapproaches. We achieve this by sampling from a data-dependent attentiondistribution in order to only process a portion of the input in highresolution. We demonstrate that our model can learn both the attention and thefeatures in an end-to-end fashion using only single image-wise labels forsupervision.Subsequently, we shift our attention to transformer architectures and introducea kernelized formulation for self-attention that reduces its quadraticcomplexity to linear with respect to the input sequence's length. Furthermore,we uncover the relationship between autoregressive transformers and recurrentneural networks and show that our formulation enables up to 3 orders ofmagnitude faster autoregressive inference.Finally, we develop clustered, attention a method that can approximate softmaxtransformers with reduced computation. This is achieved by grouping elements ofthe input using clustering. We showcase that our formulation provides a bettertrade-off between performance and computation in comparison to the originaltransformer architecture. In addition, we demonstrate that clustered attentioncan approximate pretrained transformer models without any fine-tuning and withminimal loss in performance.

Official source

https://infoscience.epfl.ch/record/294302?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Stop Wasting my FLOPS: Improving the Efficiency of Deep Learning Models

Graph Chatbot

Chat with Graph Search

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Deep Learning Theory Through the Lens of Diagonal Linear Networks