Transformer (machine learning model)

A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence. Input text is split into n-grams encoded as tokens and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. Though the transformer paper was published in 2017, the softmax-based attention mechanism was proposed earlier in 2014 by Bahdanau, Cho, and Bengio for machine translation, and the Fast Weight Controller, similar to a transformer, was proposed in 1992 by Schmidhuber. This architecture is now used not only in natural language processing and computer vision, but also in audio and multi-modal processing. It has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs) and BERT (Bidirectional Encoder Representations from Transformers). In 1990, Elman network, in which a recurrent network was trained on simple sentences like "dog chases man", was proposed. The (pre-)trained model was used to convert each word into a vector and the whole vocabulary into a vector database. Groups of vectors were clustered by closeness into a tree. The tree was then found to have a structure. The groups of vectors representing verbs and nouns each belonged to a different large cluster.

Transformer (machine learning model)

Graph Chatbot

Chat with Graph Search

The multimodality cell segmentation challenge: toward universal solutions

Toward Automatic Typography Analysis: Serif Classification and Font Similarities

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

The multimodality cell segmentation challenge: toward universal solutions

Toward Automatic Typography Analysis: Serif Classification and Font Similarities

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures