Lecture

Language Models: From Theory to Computation

Description

This lecture covers the mathematics behind language models, including the design of architecture, pre-training, and fine-tuning. It explores the basics of language models, self-attention, and transformer architectures, emphasizing the importance of pre-training and fine-tuning for various tasks. The lecture delves into word representations, word embeddings, and the training process of neural networks for language models. It also discusses the emergence of advanced models like GPT-1, GPT-2, GPT-3, and GPT-4, highlighting their capabilities in unsupervised learning, few-shot learning, and in-context learning. The lecture concludes with insights into the predictability of model scaling and the alignment of language models with human instructions.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.