**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Confidence interval

Summary

In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated confidence level; the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used. The confidence level, degree of confidence or confidence coefficient represents the long-run proportion of CIs (at the given confidence level) that theoretically contain the true value of the parameter; this is tantamount to the nominal coverage probability. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.
Factors affecting the width of the CI include the sample size, the variability in the sample, and the confidence level. All else being the same, a larger sample produces a narrower confidence interval, greater variability in the sample produces a wider confidence interval, and a higher confidence level produces a wider confidence interval.
De

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (140)

Related concepts (56)

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

In probability theory and statistics, variance is the squared deviation from the mean of a random variable. The variance is also often defined as the square of the standard deviation. Variance is a

Related courses (90)

MGT-581: Introduction to econometrics

The course provides an introduction to econometrics. The objective is to learn how to make valid (i.e., causal) inference from economic data. It explains the main estimators and present methods to deal with endogeneity issues.

MATH-234(d): Probability and statistics

Ce cours enseigne les notions élémentaires de la théorie de probabilité et de la statistique, tels que l'inférence, les tests et la régression.

FIN-403: Econometrics

The course covers basic econometric models and methods that are routinely applied to obtain inference results in economic and financial applications.

Related publications (100)

Loading

Loading

Loading

Related units (61)

Related lectures (266)

The high energy physics unfolding problem is an important statistical inverse problem in data analysis at the Large Hadron Collider (LHC) at CERN. The goal of unfolding is to make nonparametric inferences about a particle spectrum from measurements smeared by the finite resolution of the particle detectors. Previous unfolding methods use ad hoc discretization and regularization, resulting in confidence intervals that can have significantly lower coverage than their nominal level. Instead of regularizing using a roughness penalty or stopping iterative methods early, we impose physically motivated shape constraints: positivity, monotonicity, and convexity. We quantify the uncertainty by constructing a nonparametric confidence set for the true spectrum, consisting of all those spectra that satisfy the shape constraints and that predict the observations within an appropriately calibrated level of fit. Projecting that set produces simultaneous confidence intervals for all functionals of the spectrum, including averages within bins. The confidence intervals have guaranteed conservative frequentist finite-sample coverage in the important and challenging class of unfolding problems for steeply falling particle spectra. We demonstrate the method using simulations that mimic unfolding the inclusive jet transverse momentum spectrum at the LHC. The shape-constrained intervals provide usefully tight conservative inferences, while the conventional methods suffer from severe undercoverage.

This thesis studies statistical inference in the high energy physics unfolding problem, which is an ill-posed inverse problem arising in data analysis at the Large Hadron Collider (LHC) at CERN. Any measurement made at the LHC is smeared by the finite resolution of the particle detectors and the goal in unfolding is to use these smeared measurements to make nonparametric inferences about the underlying particle spectrum. Mathematically the problem consists in inferring the intensity function of an indirectly observed Poisson point process. Rigorous uncertainty quantification of the unfolded spectrum is of central importance to particle physicists. The problem is typically solved by first forming a regularized point estimator in the unfolded space and then using the variability of this estimator to form frequentist confidence intervals. Such confidence intervals, however, underestimate the uncertainty, since they neglect the bias that is used to regularize the problem. We demonstrate that, as a result, conventional statistical techniques as well as the methods that are presently used at the LHC yield confidence intervals which may suffer from severe undercoverage in realistic unfolding scenarios. We propose two complementary ways of addressing this issue. The first approach applies to situations where the unfolded spectrum is expected to be a smooth function and consists in using an iterative bias-correction technique for debiasing the unfolded point estimator obtained using a roughness penalty. We demonstrate that basing the uncertainties on the variability of the bias-corrected point estimator provides significantly improved coverage with only a modest increase in the length of the confidence intervals, even when the amount of bias-correction is chosen in a data-driven way. We compare the iterative bias-correction to an alternative debiasing technique based on undersmoothing and find that, in several situations, bias-correction provides shorter confidence intervals than undersmoothing. The new methodology is applied to unfolding the Z boson invariant mass spectrum measured in the CMS experiment at the LHC. The second approach exploits the fact that a significant portion of LHC particle spectra are known to have a steeply falling shape. A physically justified way of regularizing such spectra is to impose shape constraints in the form of positivity, monotonicity and convexity. Moreover, when the shape constraints are applied to an unfolded confidence set, one can regularize the length of the confidence intervals without sacrificing coverage. More specifically, we form shape-constrained confidence intervals by considering all those spectra that satisfy the shape constraints and fit the smeared data within a given confidence level. This enables us to derive regularized unfolded uncertainties which have by construction guaranteed simultaneous finite-sample coverage, provided that the true spectrum satisfies the shape constraints. The uncertainties are conservative, but still usefully tight. The method is demonstrated using simulations designed to mimic unfolding the inclusive jet transverse momentum spectrum at the LHC.

xtreme value analysis is concerned with the modelling of extreme events such as floods and heatwaves, which can have large impacts. Statistical modelling can be useful to better assess risks even if, due to scarcity of measurements, there is inherently very large residual uncertainty in any analysis. Driven by the increase in environmental databases, spatial modelling of extremes has expanded rapidly in the last decade. This thesis presents contributions to such analysis.
The first chapter is about likelihood-based inference in the univariate setting and investigates the use of bias-correction and higher-order asymptotic methods for extremes, highlighting through examples and illustrations the unique challenge posed by data scarcity. We focus on parametric modelling of extreme values, which relies on limiting distributional results and for which, as a result, uncertainty quantification is complicated. We find that, in certain cases, small-sample asymptotic methods can give improved inference by reducing the error rate of confidence intervals. Two data illustrations, linked to assessment of the frequency of extreme rainfall episodes in Venezuela and the analysis of survival of supercentenarians, illustrate the methods developed.
In the second chapter, we review the major methods for the analysis of spatial extremes models. We highlight the similarities and provide a thorough literature review along with novel simulation algorithms. The methods described therein are made available through a statistical software package.
The last chapter focuses on estimation for a Bayesian hierarchical model derived from a multivariate generalized Pareto process. We review approaches for the estimation of censored components in models derived from (log)-elliptical distributions, paying particular attention to the estimation of a high-dimensional Gaussian distribution function via Monte Carlo methods. The impacts of model misspecification and of censoring are explored through extensive simulations and we conclude with a case study of rainfall extremes in Eastern Switzerland.