Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture delves into the fundamental concepts of words, tokens, and language models in Natural Language Processing (NLP). It starts by discussing the challenges in defining words and tokens, emphasizing the importance of context. The lecture explores the distinction between words and tokens, the role of lexicons in NLP systems, and the use of n-grams for language modeling. It covers the implementation of lexica, access methods, and the significance of surface forms. Additionally, it explains the estimation of probabilities in language models, including additive smoothing techniques. The lecture concludes by highlighting the key points related to lexica usage, tokenization challenges, the effectiveness of n-grams, and smoothing methods.
This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.
Watch on Mediaspace