In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size , a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size obtained by omitting one observation. The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool. The jackknife is a linear approximation of the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations. For example, if the parameter to be estimated is the population mean of random variable , then for a given set of i.i.d. observations the natural estimator is the sample mean: where the last sum used another way to indicate that the index runs over the set . Then we proceed as follows: For each we compute the mean of the jackknife subsample consisting of all but the -th data point, and this is called the -th jackknife replicate: It could help to think that these jackknife replicates give us an approximation of the distribution of the sample mean and the larger the the better this approximation will be. Then finally to get the jackknife estimator we take the average of these jackknife replicates: One may ask about the bias and the variance of . From the definition of as the average of the jackknife replicates one could try to calculate explicitly, and the bias is a trivial calculation but the variance of is more involved since the jackknife replicates are not independent.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (2)
MATH-517: Statistical computation and visualisation
The course will provide the opportunity to tackle real world problems requiring advanced computational skills and visualisation techniques to complement statistical thinking. Students will practice pr
BIO-322: Introduction to machine learning for bioengineers
Students understand basic concepts and methods of machine learning. They can describe them in mathematical terms and can apply them to data using a high-level programming language (julia/python/R).
Related people (1)
Related concepts (5)
Bootstrapping (statistics)
Bootstrapping is any test or metric that uses random sampling with replacement (e.g. mimicking the sampling process), and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods. Bootstrapping estimates the properties of an estimand (such as its variance) by measuring those properties when sampling from an approximating distribution.
Resampling (statistics)
In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are: Permutation tests (also re-randomization tests) Bootstrapping Cross validation Permutation test Permutation tests rely on resampling the original data assuming the null hypothesis. Based on the resampled data it can be concluded how likely the original data is to occur under the null hypothesis.
Cross-validation (statistics)
Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.