Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the fundamental tasks of document retrieval and classification in text analysis. It starts by explaining the challenges of handling unstructured textual data from various sources like the web and social media. The instructor introduces the concept of document retrieval, where documents are ranked based on their similarity to a query. Then, the focus shifts to document classification, where documents are assigned to predefined classes. The lecture also delves into sentiment analysis, determining the sentiment of a text, and topic detection, identifying prevalent topics in a collection of documents. Various techniques such as supervised learning, feature vectors, and bag-of-words models are discussed in detail, along with the importance of preprocessing steps like tokenization, stopword removal, and word normalization.