Publication

Diffusion estimation over cooperative networks with missing data

Ali H. Sayed
2013
Conference paper
Abstract

In many fields, and especially in the medical and social sciences and in various recommender systems, data are often gathered through clinical studies or targeted surveys. Participants are generally reluctant to respond to all questions in a survey or they may lack information to respond adequately to the questions. The data collected from these studies tend to lead to linear regression models where the regression vectors are only known partially: some of their entries are either missing completely or replaced randomly by noisy values. There are also situations where it is not known beforehand which entries are missing or censored. There have been many useful studies in the literature on techniques to perform estimation and inference with missing data. In this work, we examine how a connected network of agents, with each one of them subjected to a stream of data with incomplete regression information, can cooperate with each other through local interactions to estimate the underlying model parameters in the presence of missing data. We explain how to modify traditional distributed strategies through regularization in order to eliminate the bias introduced by the incomplete model. We also examine the stability and performance of the resulting diffusion strategy and provide simulations in support of the findings. We consider two applications: one dealing with a mental health survey and the other dealing with a household consumption survey.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (33)
Linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
Missing data
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. Missing data can occur because of nonresponse: no information is provided for one or more items or for a whole unit ("subject"). Some items are more likely to generate a nonresponse than others: for example items about private subjects such as income.
Regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.
Show more
Related publications (65)

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

Paul Scharnhorst

In light of the challenges posed by climate change and the goals of the Paris Agreement, electricity generation is shifting to a more renewable and decentralized pattern, while the operation of systems like buildings is increasingly electrified. This calls ...
EPFL2024

Reliable uncertainties: Error correlation, rotated error bars, and linear regressions in three-isotope plots and beyond

Reto Georg Trappitsch

Correlated errors of experimental data are a common but often neglected problem in physical sciences. Various tools are provided here for thorough propagation of uncertainties in cases of correlated errors. Discussed are techniques especially applicable to ...
ELSEVIER2023

Spatial Distributions of Diarrheal Cases in Relation to Housing Conditions in Informal Settlements: A Cross-Sectional Study in Abidjan, Côte d’Ivoire

Jérôme Chenal, Vitor Pessoa Colombo, Jürg Utzinger

In addition to individual practices and access to water, sanitation, and hygiene (WASH) facilities, housing conditions may also be associated with the risk of diarrhea. Our study embraced a broad approach to health determinants by looking at housing depriv ...
2023
Show more
Related MOOCs (24)
Selected Topics on Discrete Choice
Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t
Selected Topics on Discrete Choice
Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.