Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In conventional vector space model for information retrieval, query vector generation is imperfect for retrieval of precise documents which are de-sired by user. In this paper, we present a stochastic based approach for optimiz-ing query vector without user involvement. We explore the document search space using particle swarm optimization and exploit the search space of possi-ble relevant and non-relevant documents for adaption of query vector. Proposed method improves the retrieval accuracy by optimizing the query vector which is generated in conventional vector space model based on various term weighting strategies including TF-IDF and document length normalization. Our experi-mental result on two collections Medline and Cranfield shows that adapted query vector in pseudo relevant document performs better over the classical vector space model. We achieved improvement of 3-4% in Mean Average Pre-cision (MAP) and 5-10% in Precision at lower recall. Further expansion of search space in pseudo non-relevant documents does not lead to significant im-provement, but proper representation of pseudo non-relevant document leaves a scope in future to guide the better optimization of query vector.
Jean-François Molinari, Son-Jonathan Pham-Ba
Alcherio Martinoli, Inaki Navarro Oiza, Ezequiel Leonardo Di Mario
Alcherio Martinoli, Ali Marjovi, Milos Vasic, Joseph Chadi Benoit Lemaitre