Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the handling of text data, focusing on document retrieval and classification. Topics include typical tasks like sentiment analysis and topic detection, the use of TF-IDF matrices, and the challenges of sparsity in text data. The instructor introduces the concept of bag-of-words and discusses the application of matrix factorization techniques. The lecture also delves into the use of contextualized word vectors, such as BERT, for more advanced natural language processing tasks. The NLP pipeline, from tokenization to coreference resolution, is explained, along with the importance of contextualized word vectors in modern NLP models.