Evaluation measures for an information retrieval (IR) system assess how well an index, search engine or database returns results from a collection of resources that satisfy a user's query. They are therefore fundamental to the success of information systems and digital platforms. The success of an IR system may be judged by a range of criteria including relevance, speed, user satisfaction, usability, efficiency and reliability. However, the most important factor in determining a system's effectiveness for users is the overall relevance of results retrieved in response to a query. Evaluation measures may be categorised in various ways including offline or online, user-based or system-based and include methods such as observed user behaviour, test collections, precision and recall, and scores from prepared benchmark test sets. Evaluation for an information retrieval system should also include a validation of the measures used, i.e. an assessment of how well they measure what they are intended to measure and how well the system fits its intended use case. Measures are generally used in two settings: online experimentation, which assesses users' interactions with the search system, and offline evaluation, which measures the effectiveness of an information retrieval system on a static offline collection. Indexing and classification methods to assist with information retrieval have a long history dating back to the earliest libraries and collections however systematic evaluation of their effectiveness began in earnest in the 1950s with the rapid expansion in research production across military, government and education and the introduction of computerised catalogues. At this time there were a number of different indexing, classification and cataloguing systems in operation which were expensive to produce and it was unclear which was the most effective. Cyril Cleverdon, Librarian of the College of Aeronautics, Cranfield, England, began a series of experiments of print indexing and retrieval methods in what is known as the Cranfield paradigm, or Cranfield tests, which set the standard for IR evaluation measures for many years.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.