Explores the Transformer model, from recurrent models to attention-based NLP, highlighting its key components and significant results in machine translation and document generation.
Explores the mathematics of language models, covering architecture design, pre-training, and fine-tuning, emphasizing the importance of pre-training and fine-tuning for various tasks.