Lecture

Embedding Models: Concepts and Retrieval

In course

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

Description

This lecture introduces embedding models for document retrieval, focusing on the challenges of vector space retrieval and the concept of latent semantic indexing. It covers the key idea of mapping documents and queries into a lower-dimensional space composed of higher-level concepts, illustrating the process with examples. The lecture also discusses the application of singular value decomposition (SVD) to identify top concepts and the implementation of these concepts in Python. Alternative techniques like Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation are presented, highlighting their advantages in concept extraction. The lecture concludes with a discussion on the use of topic models for unsupervised learning, document retrieval, and document classification.

Instructor

Karl Aberer

Karl Aberer received his PhD in mathematics in 1991 from the ETH Zürich. From 1991 to 1992 he was postdoctoral fellow at the International Computer Science Institute (ICSI) at the University of California, Berkeley. In 1992, he joined the Integrated Publication and Information Systems institute (IPSI) of GMD in Germany, where he was leading the research division Open Adaptive Information Management Systems. In 2000 he joined EPFL as full professor. Since 2005 he is the director of the Swiss National Research Center for Mobile Information and Communication Systems ( NCCR-MICS, www.mics.ch ). He is member of the editorial boards of VLDB Journal, ACM Transaction on Autonomous and Adaptive Systems and World Wide Web Journal. He has been consulting for the Swiss government in research and science policy as a member of the Swiss Research and Technology Council ( SWTR ) from 2003 - 2011.

Official source

Ontological neighbourhood

Information engineering

Natural language processing: Topics in natural language processing

Mathematics

Algebra: Linear algebra

Related lectures (29)

Latent Semantic Indexing: Concepts and Applications

Explores Latent Semantic Indexing, a technique for mapping documents into a concept space for retrieval and classification.

Latent Semantic Indexing

Covers Latent Semantic Indexing, word embeddings, and the skipgram model with negative sampling.

Latent Semantic Indexing

Covers Latent Semantic Indexing, a method to improve information retrieval by mapping documents and queries into a lower-dimensional concept space.

Handling Text: Document Retrieval, Classification, Sentiment Analysis

Explores document retrieval, classification, sentiment analysis, TF-IDF matrices, nearest-neighbor methods, matrix factorization, regularization, LDA, contextualized word vectors, and BERT.

Document Retrieval and Classification

Covers document retrieval, classification, sentiment analysis, and topic detection using TF-IDF matrices and contextualized word vectors like BERT.