Publication

Fast Transformers with Clustered Attention

Related concepts (36)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Transformer (machine learning model)

A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence.

Complexity class

In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory. In general, a complexity class is defined in terms of a type of computational problem, a model of computation, and a bounded resource like time or memory. In particular, most complexity classes consist of decision problems that are solvable with a Turing machine, and are differentiated by their time or space (memory) requirements.

Fixed-point theorem

In mathematics, a fixed-point theorem is a result saying that a function F will have at least one fixed point (a point x for which F(x) = x), under some conditions on F that can be stated in general terms. The Banach fixed-point theorem (1922) gives a general criterion guaranteeing that, if it is satisfied, the procedure of iterating a function yields a fixed point.

Fixed point (mathematics)

hatnote|1=Fixed points in mathematics are not to be confused with other uses of "fixed point", or stationary points where math|1=f(x) = 0. In mathematics, a fixed point (sometimes shortened to fixpoint), also known as an invariant point, is a value that does not change under a given transformation. Specifically for functions, a fixed point is an element that is mapped to itself by the function. Formally, c is a fixed point of a function f if c belongs to both the domain and the codomain of f, and f(c) = c.

Quantum complexity theory

Quantum complexity theory is the subfield of computational complexity theory that deals with complexity classes defined using quantum computers, a computational model based on quantum mechanics. It studies the hardness of computational problems in relation to these complexity classes, as well as the relationship between quantum complexity classes and classical (i.e., non-quantum) complexity classes. Two important quantum complexity classes are BQP and QMA.

Generative pre-trained transformer

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. The first GPT was introduced in 2018 by OpenAI. GPT models are artificial neural networks that are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

Average-case complexity

In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case complexity which considers the maximal complexity of the algorithm over all possible inputs. There are three primary motivations for studying average-case complexity.

Parameterized complexity

In computer science, parameterized complexity is a branch of computational complexity theory that focuses on classifying computational problems according to their inherent difficulty with respect to multiple parameters of the input or output. The complexity of a problem is then measured as a function of those parameters. This allows the classification of NP-hard problems on a finer scale than in the classical setting, where the complexity of a problem is only measured as a function of the number of bits in the input.

Fixed-point iteration

In numerical analysis, fixed-point iteration is a method of computing fixed points of a function. More specifically, given a function defined on the real numbers with real values and given a point in the domain of , the fixed-point iteration is which gives rise to the sequence of iterated function applications which is hoped to converge to a point . If is continuous, then one can prove that the obtained is a fixed point of , i.e., More generally, the function can be defined on any metric space with values in that same space.

Lefschetz fixed-point theorem

In mathematics, the Lefschetz fixed-point theorem is a formula that counts the fixed points of a continuous mapping from a compact topological space to itself by means of traces of the induced mappings on the homology groups of . It is named after Solomon Lefschetz, who first stated it in 1926. The counting is subject to an imputed multiplicity at a fixed point called the fixed-point index. A weak version of the theorem is enough to show that a mapping without any fixed point must have rather special topological properties (like a rotation of a circle).