Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
The Web constitutes a valuable source of information. In recent years, it fostered the construction of large-scale knowledge bases, such as Freebase, YAGO, and DBpedia, each storing millions of facts about society in general, and specific domains, such as politics or medicine. The open nature of the Web, with content potentially being generated by everyone, however, leads to inaccuracies and misinformation, such as fake news and exaggerated claims. Construction and maintenance of a knowledge base thus relies on fact checking, assessing the credibility of facts. Due to the inherent lack of ground truth information, fact checking cannot be done in a purely automated manner, but requires human involvement. In this paper, we propose a framework to guide users in the validation of facts, striving for a minimisation of the invested effort. Specifically, we present a probabilistic model to identify the facts for which manual validation is most beneficial. As a consequence, our approach yields a high-quality knowledge base, even if only a sample of a collection of facts is validated. Our experiments with three large-scale datasets demonstrate the efficiency and effectiveness of our approach, reaching levels of above 90% precision of the knowledge base with only a third of the validation effort required by baseline techniques.
Dolaana Khovalyg, Matteo Favero, Giorgia Chinazzo, Mandana Sarey Khanie, Verena Marie Barthelmes