Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The Web became the central medium for valuable sources of information extraction applications. However, such user-generated resources are often plagued by inaccuracies and misinformation due to the inherent openness and uncertainty of the Web. In this work we study the problem of extracting structured information out of Web data with a credibility guarantee. The ultimate goal is that not only the structured information should be extracted as much as possible but also its credibility is high. To achieve this goal, we propose a learning process to optimize the parameters of a probabilistic model that captures the relationships between users, their unstructured contents, and the underlying structured information. Our evaluations on real-world datasets show that our approach outperforms the baseline up to 6 times.
Pierre Dillenbourg, Daniel Carnieto Tozadore, Chenyang Wang, Barbara Bruno, David Cohen