Explores decoding from neural models in modern NLP, covering encoder-decoder models, decoding algorithms, issues with argmax decoding, and the impact of beam size.
Explores the Transformer model, from recurrent models to attention-based NLP, highlighting its key components and significant results in machine translation and document generation.
Explains the full architecture of Transformers and the self-attention mechanism, highlighting the paradigm shift towards using completely pretrained models.