Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Poor decisions and selfish behaviors give rise to seemingly intractable global problems, such as the lack of transparency in democratic processes, the spread of conspiracy theories, and the rise in greenhouse gas emissions. However, people are more predictable than we think, and with machine-learning algorithms and sufficiently large datasets, we can design accurate models of human behavior in a variety of settings. In this thesis, to gain insight into social processes, we develop highly interpretable probabilistic choice-models. We draw from the econometrics literature on discrete-choice models and combine them with matrix factorization methods, Bayesian statistics, and generalized linear models. These predictive models enable interpretability through their learned parameters and latent factors.
First, we study the social dynamics behind group collaborations for the collective creation of content, such as in Wikipedia, the Linux kernel, and the European Union law-making process. By combining the Bradley-Terry and Rasch models with matrix factorization and natural language processing, we develop a model of edit acceptance in peer-production systems. We discover controversial components (e.g., Wikipedia articles and European laws) and influential users (e.g., Wikipedia editors and parliamentarians), as well as features that correlate with a high probability of edit acceptance. The latent representations capture non-linear interactions between components and users, and they cluster well into different topics (e.g., historical figures and TV characters in Wikipedia, business and environment in European laws).
Second, we develop an algorithm for predicting the outcome of elections and of referenda by combining matrix factorization and generalized linear models. Our algorithm learns representations of votes and regions, which capture ideological and cultural voting patterns (e.g., liberal/conservative, rural/urban), and it predicts the vote results in unobserved regions from partial observations. We test our model on voting data in Germany, Switzerland, and the US, and we deploy it on a Web platform to predict Swiss referendum votes in real-time. On average, our predictions reach a mean absolute error of 1% after observing only 5% of the regions.
Third, we study how people perceive the carbon footprint of their day-to-day actions. We cast this problem as a comparison problem between pairs of actions (e.g., the difference between flying across continents and using household appliances), and we develop a statistical model of relative comparisons reminiscent of the Thurstone model in psychometrics. The model learns the usersâ perception as the parameters of a Bayesian linear regression, which enables us to derive an active-learning algorithm to collect data efficiently. Our experiments show that users overestimate the emissions of low-footprint actions and underestimate those of high-footprint actions.
Finally, we design a probabilistic model of pairwise-comparison outcomes that capture a wide range of time dynamics. We achieve this by replacing the static parameters of a class of popular pairwise-comparison models with continuous-time Gaussian processes. We also develop an efficient inference algorithm that computes, with only a few linear-time iterations over the data, an approximate Bayesian posterior distribution.
, , ,