Explains the full architecture of Transformers and the self-attention mechanism, highlighting the paradigm shift towards using completely pretrained models.
Explores the mathematics of language models, covering architecture design, pre-training, and fine-tuning, emphasizing the importance of pre-training and fine-tuning for various tasks.
Explores the Transformer model, from recurrent models to attention-based NLP, highlighting its key components and significant results in machine translation and document generation.