Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In many fields, and especially in the medical and social sciences and in various recommender systems, data are often gathered through clinical studies or targeted surveys. Participants are generally reluctant to respond to all questions in a survey or they may lack information to respond adequately to the questions. The data collected from these studies tend to lead to linear regression models where the regression vectors are only known partially: some of their entries are either missing completely or replaced randomly by noisy values. There are also situations where it is not known beforehand which entries are missing or censored. There have been many useful studies in the literature on techniques to perform estimation and inference with missing data. In this work, we examine how a connected network of agents, with each one of them subjected to a stream of data with incomplete regression information, can cooperate with each other through local interactions to estimate the underlying model parameters in the presence of missing data. We explain how to modify traditional distributed strategies through regularization in order to eliminate the bias introduced by the incomplete model. We also examine the stability and performance of the resulting diffusion strategy and provide simulations in support of the findings. We consider two applications: one dealing with a mental health survey and the other dealing with a household consumption survey.
Jérôme Chenal, Vitor Pessoa Colombo, Jürg Utzinger