Web Text Retrieval with a P2P Query-Driven Index

Karl Aberer, Martin Rajman, Vinh Toan Luu, Ivana Podnar, Gleb Skobeltsyn
2007
Conference paper

Abstract

In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable storage and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the transmitted posting lists never exceed a constant size. However, as the number of generated term combinations can still become quite large, we also use term statistics extracted from available query logs to index only such combinations that are frequently present in user queries. Thus, by avoiding the generation of superfluous indexing term combinations, we achieve an additional substantial reduction in bandwidth and storage consumption. As a result, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users. More precisely, our theoretical analysis and experimental results indicate that, at the price of a marginal loss in retrieval quality for rare queries, the generated index size and network traffic remain manageable even for web-size document collections. Furthermore, our experiments show that at the same time the achieved retrieval quality is fully comparable to the one obtained with a state-of-the-art centralized query engine.

Official source

https://infoscience.epfl.ch/record/104360?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Web Text Retrieval with a P2P Query-Driven Index

Graph Chatbot

Chat with Graph Search

Knowledge-Aware Cross-Modal Text-Image Retrieval for Remote Sensing Images

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

Multimodal Reranking of Content-based Recommendations for Hyperlinking Video Snippets

Multimodal Reranking of Content-based Recommendations for Hyperlinking Video Snippets

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

Knowledge-Aware Cross-Modal Text-Image Retrieval for Remote Sensing Images