Publication

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

François Fleuret, Nikolaos Pappas, Angelos Katharopoulos, Apoorv Vyas
2020
Rapport ou document de travail

Résumé

Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input’s length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from O(N^2) to O(N), where N is the sequence length. We show that this formulation permits an iterative implementation that dramatically accelerates autoregressive transformers and reveals their relationship to recurrent neural networks. Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences

Source officielle

https://infoscience.epfl.ch/record/278853?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Graph Chatbot

Chattez avec Graph Search

Random matrix methods for high-dimensional machine learning models

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Randomized low-rank approximation and its applications

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Random matrix methods for high-dimensional machine learning models

Randomized low-rank approximation and its applications