**Are you an EPFL student looking for a semester project?**

Lecture# Information Retrieval Indexing: Part 2

Description

This lecture covers the construction of an inverted file for information retrieval indexing, addressing granularity levels, and the use of tries in index construction. It explains the process of searching the inverted file, vocabulary search, and manipulation of occurrences. The lecture also discusses index compression, index merging, and the map-reduce programming model for constructing the inverted file. Additionally, it explores the applications of map-reduce frameworks in various tasks, such as graph processing and learning probabilistic models.

Login to watch the video

Official source

Instructor

In course

Related concepts (97)

CS-423: Distributed information systems

This course introduces the key concepts and algorithms from the areas of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed infor

In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering (software construction and also software verification) to layout algorithms and picture generation. Graph transformations can be used as a computation abstraction. The basic idea is that if the state of a computation can be represented as a graph, further steps in that computation can then be represented as transformation rules on that graph.

In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from the field of graph theory within mathematics. A graph data structure consists of a finite (and possibly mutable) set of vertices (also called nodes or points), together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges (also called links or lines), and for a directed graph are also known as edges but also sometimes arrows or arcs.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation.

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices (also called nodes or points) which are connected by edges (also called links or lines). A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.