Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the mathematics behind language models, including the design of architecture, pre-training, and fine-tuning. It explores the basics of language models, self-attention, and transformer architectures, emphasizing the importance of pre-training and fine-tuning for various tasks. The lecture delves into word representations, word embeddings, and the training process of neural networks for language models. It also discusses the emergence of advanced models like GPT-1, GPT-2, GPT-3, and GPT-4, highlighting their capabilities in unsupervised learning, few-shot learning, and in-context learning. The lecture concludes with insights into the predictability of model scaling and the alignment of language models with human instructions.