In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size , a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size obtained by omitting one observation. The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool. The jackknife is a linear approximation of the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations. For example, if the parameter to be estimated is the population mean of random variable , then for a given set of i.i.d. observations the natural estimator is the sample mean: where the last sum used another way to indicate that the index runs over the set . Then we proceed as follows: For each we compute the mean of the jackknife subsample consisting of all but the -th data point, and this is called the -th jackknife replicate: It could help to think that these jackknife replicates give us an approximation of the distribution of the sample mean and the larger the the better this approximation will be. Then finally to get the jackknife estimator we take the average of these jackknife replicates: One may ask about the bias and the variance of . From the definition of as the average of the jackknife replicates one could try to calculate explicitly, and the bias is a trivial calculation but the variance of is more involved since the jackknife replicates are not independent.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (2)
MATH-517: Statistical computation and visualisation
The course will provide the opportunity to tackle real world problems requiring advanced computational skills and visualisation techniques to complement statistical thinking. Students will practice pr
BIO-322: Introduction to machine learning for bioengineers
Students understand basic concepts and methods of machine learning. They can describe them in mathematical terms and can apply them to data using a high-level programming language (julia/python/R).
Related lectures (13)
Data Science Visualization with Pandas
Covers data manipulation and exploration using Python with a focus on visualization techniques.
Describing Data: Statistics & Uncertainty
Introduces descriptive statistics, uncertainty quantification, and variable relationships, emphasizing the importance of statistical interpretation and critical analysis.
Validation and k-Nearest Neighbors Method
Introduces supervised learning concepts and the k-Nearest Neighbors method for classification and regression tasks.
Show more
Related publications (26)

Deep Learning Generalization with Limited and Noisy Labels

Mahsa Forouzesh

Deep neural networks have become ubiquitous in today's technological landscape, finding their way in a vast array of applications. Deep supervised learning, which relies on large labeled datasets, has been particularly successful in areas such as image cla ...
EPFL2023

Robust optimization of control parameters for WEC arrays using stochastic methods

Tommaso Vanzan, Edie Miglio

This work presents a new computational optimization framework for the robust control of parks of Wave Energy Converters (WEC) in irregular waves. The power of WEC parks is maximized with respect to the individual control damping and stiffness coefficients ...
MOX Modeling and Scientific Computing2023

DeepBreath-automated detection of respiratory pathology from lung auscultation in 572 pediatric outpatients across 5 countries

Martin Jaggi, Mary-Anne Hartley, Juliane Dervaux, Tatjana Chavdarova, Daniel Mueller, Julien Niklas Heitmann, Daniel Hinjos García

The interpretation of lung auscultation is highly subjective and relies on non-specific nomenclature. Computer-aided analysis has the potential to better standardize and automate evaluation. We used 35.9 hours of auscultation audio from 572 pediatric outpa ...
NATURE PORTFOLIO2023
Show more
Related people (1)
Related concepts (5)
Bootstrapping (statistics)
Bootstrapping is any test or metric that uses random sampling with replacement (e.g. mimicking the sampling process), and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods. Bootstrapping estimates the properties of an estimand (such as its variance) by measuring those properties when sampling from an approximating distribution.
Resampling (statistics)
In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are: Permutation tests (also re-randomization tests) Bootstrapping Cross validation Permutation test Permutation tests rely on resampling the original data assuming the null hypothesis. Based on the resampled data it can be concluded how likely the original data is to occur under the null hypothesis.
Cross-validation (statistics)
Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.