Transformers: Full Architecture and Self-Attention Mechanism

About
Privacy
Disclaimer

Graph Chatbot

Related lectures (30)

Page 3 of 3

Transformers in Vision: Applications and Architectures

Covers the impact of transformers in computer vision, discussing their architecture, applications, and advancements in various tasks.

Modern NLP: Data Collection, Annotation & Biases

Explores data annotation in NLP and the impact of biases on model fine-tuning.

From Attention to Transformers

Explores the evolution from attention mechanisms to transformers in modern NLP, emphasizing the significance of self-attention and cross-attention.

Decoding from Neural Models

Explores decoding from neural models in modern NLP, covering encoder-decoder models, decoding algorithms, issues with argmax decoding, and the impact of beam size.

Graph-to-Graph Transformers: Syntax-aware Graph Encoding

Introduces the Syntax-aware Graph-to-Graph Transformer architecture for effective conditioning on syntactic dependency graphs.

Model Compression: Techniques for Efficient NLP Models

Explores model compression techniques in NLP, discussing pruning, quantization, weight factorization, knowledge distillation, and attention mechanisms.

Pretraining Sequence-to-Sequence Models: BART + T5

Explores pretraining sequence-to-sequence models with BART and T5, discussing transfer learning, fine-tuning, model architectures, tasks, performance comparison, summarization results, and references.

Sequence to Sequence Models: Overview and Applications

Covers sequence to sequence models, their architecture, applications, and the role of attention mechanisms in improving performance.

Chemical Reaction Prediction: Molecular Transformer

Explores chemical reaction prediction using generative models and molecular transformers, emphasizing the importance of molecular language processing and stereochemistry.

Transformer Architecture: The X Gomega

Delves into the Transformer architecture, self-attention, and training strategies for machine translation and image recognition.