Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In the information age, the Web and the growing global connectivity drastically simplified our access to information. Learning and fact-checking from online resources is nowadays part of our daily routine. Studying the dynamic associated with online content consumption is critical to understanding human behavior and informing future platforms' design. In this thesis, we provide a comprehensive overview of online knowledge-seeking, a specific instance of information-seeking, by describing the behavioral pattern of Wikipedia readers. Despite the importance and pervasiveness of Wikipedia as one of the largest platforms for open knowledge, surprisingly little is known about how people navigate and interact with its content. This thesis is organized around two major contributions. We start with a large-scale characterization of the navigation patterns on Wikipedia in English, and then we introduce the tools we developed to conduct our analyses.In the first part, we shed light on the navigation patterns with three large-scale studies based on passively collected digital traces. Using billions of requests collected in Wikipedia's logs, we measure how readers reach articles, transition between pages, and leave the platform. We provide a complete overview of the readers' behavior by characterizing the frequent navigation dynamics and the level of engagement with different types of external links on the page. Then, given the observed role of Wikipedia as a gateway to the Web, we quantify the hypothetical economic value of the traffic received by external websites.In the second part, we present the tools that we developed to make our analysis possible and support future work in this field. First, we introduce WikiPDA, a cross-lingual topic modeling method able to generate a shared topics space for all editions of Wikipedia. Then, we present WikiHist.html, an effort to make publicly available the full Wikipedia history in HTML format. We conclude by discussing the implications of our findings and presenting future research opportunities enabled by our contributions.
Sarah Irene Brutton Kenderdine, Yumeng Hou, Fadel Mamar Seydou
, ,