Concept

Imputation (statistics)

In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". There are three main problems that missing data causes: missing data can introduce a substantial amount of bias, make the handling and analysis of the data more arduous, and create reductions in efficiency. Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with an estimated value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. There have been many theories embraced by scientists to account for missing data but the majority of them introduce bias. A few of the well known attempts to deal with missing data include: hot deck and cold deck imputation; listwise and pairwise deletion; mean imputation; non-negative matrix factorization; regression imputation; last observation carried forward; stochastic imputation; and multiple imputation. Listwise deletion By far, the most common means of dealing with missing data is listwise deletion (also known as complete case), which is when all cases with a missing value are deleted. If the data are missing completely at random, then listwise deletion does not add any bias, but it does decrease the power of the analysis by decreasing the effective sample size. For example, if 1000 cases are collected but 80 have missing values, the effective sample size after listwise deletion is 920.

Official source

https://en.wikipedia.org/wiki/Imputation_(statistics)

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related courses (5)

CS-401: Applied data analysis

This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the dat

CS-421: Machine learning for behavioral data

Computer environments such as educational games, interactive simulations, and web services provide large amounts of data, which can be analyzed and serve as a basis for adaptation. This course will co

CIVIL-226: Introduction to machine learning for engineers

Machine learning is a sub-field of Artificial Intelligence that allows computers to learn from data, identify patterns and make predictions. As a fundamental building block of the Computational Thinki

Related lectures (24)

Transformations of Input or Output

Covers handling missing data, feature engineering, and output transformations in machine learning.

Logistic Regression: Interpretation & Feature Engineering

Covers logistic regression, probabilistic interpretation, and feature engineering techniques.

Imputation: Best Approach?

Explores the best approach for imputation in life cycle inventory calculation, covering methods and examples in wood and mineral production.

Related concepts (3)

Missing data

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. Missing data can occur because of nonresponse: no information is provided for one or more items or for a whole unit ("subject"). Some items are more likely to generate a nonresponse than others: for example items about private subjects such as income.

Data analysis

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

Robust statistics

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution.

Official source

https://en.wikipedia.org/wiki/Imputation_(statistics)

About this result

Related courses (5)

CS-401: Applied data analysis

CS-421: Machine learning for behavioral data

CIVIL-226: Introduction to machine learning for engineers

Related lectures (24)

Transformations of Input or Output

Covers handling missing data, feature engineering, and output transformations in machine learning.

Logistic Regression: Interpretation & Feature Engineering

Covers logistic regression, probabilistic interpretation, and feature engineering techniques.

Imputation: Best Approach?

Explores the best approach for imputation in life cycle inventory calculation, covering methods and examples in wood and mineral production.

Related publications (26)

Comparison of Three Imputation Methods for Groundwater Level Timeseries

Andrea Rinaldo

This study compares three imputation methods applied to the field observations of hydraulic head in subsurface hydrology. Hydrogeological studies that analyze the timeseries of groundwater elevations often face issues with missing data that may mislead bot ...

MDPI2023

Dataset for the evaluation of student-level outcomes of a primary school Computer Science curricular reform

Francesco Mondada, Jessica Elke Dehler Zufferey, Barbara Bruno, Laila Abdelsalam El-Hamamsy

The associated peer reviewed article that will appear in the International Journal of STEM education : El-Hamamsy, L., Bruno, B., Audrin, C., Chevalier, M., Avry S., Dehler Zufferey, J., and Mondada, F. (2023). How are Primary School Computer Science Curri ...

Zenodo2023

Localizing Unsynchronized Sensors With Unknown Sources

Ivan Dokmanic, Dalia Salem Hassan Fahmy El Badawy

We propose a method for sensor array self-localization using a set of sources at unknown locations. The sources produce signals whose times of arrival are registered at the sensors. We look at the general case where neither the emission times of the source ...

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC2023

Related concepts (3)

Missing data

Data analysis

Robust statistics