This work addresses the problem of reducing the time between query submission and results output in a retrieval system. The goal is achieved by considering only a database fraction as small as possible during the retrieval process. Our approach is based on a new clustering technique and comparisons with other clustering methods presented in the literature are performed. Our algorithm is shown to outperform the other techniques: retrieval performances close to those obtained with the whole corpus are achieved by selecting only 5% of the data.
Vinitra Swamy, Paola Mejia Domenzain, Julian Thomas Blackwell, Isadora Alves de Salles