Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture introduces embedding models for document retrieval, focusing on the challenges of vector space retrieval and the concept of latent semantic indexing. It covers the key idea of mapping documents and queries into a lower-dimensional space composed of higher-level concepts, illustrating the process with examples. The lecture also discusses the application of singular value decomposition (SVD) to identify top concepts and the implementation of these concepts in Python. Alternative techniques like Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation are presented, highlighting their advantages in concept extraction. The lecture concludes with a discussion on the use of topic models for unsupervised learning, document retrieval, and document classification.