**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Uncertainty quantification in unfolding elementary particle spectra at the Large Hadron Collider

Abstract

This thesis studies statistical inference in the high energy physics unfolding problem, which is an ill-posed inverse problem arising in data analysis at the Large Hadron Collider (LHC) at CERN. Any measurement made at the LHC is smeared by the finite resolution of the particle detectors and the goal in unfolding is to use these smeared measurements to make nonparametric inferences about the underlying particle spectrum. Mathematically the problem consists in inferring the intensity function of an indirectly observed Poisson point process. Rigorous uncertainty quantification of the unfolded spectrum is of central importance to particle physicists. The problem is typically solved by first forming a regularized point estimator in the unfolded space and then using the variability of this estimator to form frequentist confidence intervals. Such confidence intervals, however, underestimate the uncertainty, since they neglect the bias that is used to regularize the problem. We demonstrate that, as a result, conventional statistical techniques as well as the methods that are presently used at the LHC yield confidence intervals which may suffer from severe undercoverage in realistic unfolding scenarios. We propose two complementary ways of addressing this issue. The first approach applies to situations where the unfolded spectrum is expected to be a smooth function and consists in using an iterative bias-correction technique for debiasing the unfolded point estimator obtained using a roughness penalty. We demonstrate that basing the uncertainties on the variability of the bias-corrected point estimator provides significantly improved coverage with only a modest increase in the length of the confidence intervals, even when the amount of bias-correction is chosen in a data-driven way. We compare the iterative bias-correction to an alternative debiasing technique based on undersmoothing and find that, in several situations, bias-correction provides shorter confidence intervals than undersmoothing. The new methodology is applied to unfolding the Z boson invariant mass spectrum measured in the CMS experiment at the LHC. The second approach exploits the fact that a significant portion of LHC particle spectra are known to have a steeply falling shape. A physically justified way of regularizing such spectra is to impose shape constraints in the form of positivity, monotonicity and convexity. Moreover, when the shape constraints are applied to an unfolded confidence set, one can regularize the length of the confidence intervals without sacrificing coverage. More specifically, we form shape-constrained confidence intervals by considering all those spectra that satisfy the shape constraints and fit the smeared data within a given confidence level. This enables us to derive regularized unfolded uncertainties which have by construction guaranteed simultaneous finite-sample coverage, provided that the true spectrum satisfies the shape constraints. The uncertainties are conservative, but still usefully tight. The method is demonstrated using simulations designed to mimic unfolding the inclusive jet transverse momentum spectrum at the LHC.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related MOOCs

Loading

Related concepts (25)

Related publications (24)

Related MOOCs (36)

Confidence interval

In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated confidence level; the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used. The confidence level, degree of confidence or confidence coefficient represents the long-run proportion of CIs (at the given confidence level) that theoretically contain the true value of the parameter; this is tantamount to the nominal coverage probability.

Uncertainty

Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable or stochastic environments, as well as due to ignorance, indolence, or both. It arises in any number of fields, including insurance, philosophy, physics, statistics, economics, finance, medicine, psychology, sociology, engineering, metrology, meteorology, ecology and information science.

Estimator

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the sample mean is a commonly used estimator of the population mean. There are point and interval estimators. The point estimators yield single-valued results. This is in contrast to an interval estimator, where the result would be a range of plausible values.

Loading

Loading

Loading

Plasma Physics: Introduction

Learn the basics of plasma, one of the fundamental states of matter, and the different types of models used to describe it, including fluid and kinetic.

Plasma Physics: Introduction

Learn the basics of plasma, one of the fundamental states of matter, and the different types of models used to describe it, including fluid and kinetic.

Plasma Physics: Applications

Learn about plasma applications from nuclear fusion powering the sun, to making integrated circuits, to generating electricity.

This thesis focuses on non-parametric covariance estimation for random surfaces, i.e.~functional data on a two-dimensional domain. Non-parametric covariance estimation lies at the heart of functional

Functional time series is a temporally ordered sequence of not necessarily independent random curves. While the statistical analysis of such data has been traditionally carried out under the assumptio

Victor Panaretos, Tomas Masák, Tomas Rubin

Nonparametric inference for functional data over two-dimensional domains entails additional computational and statistical challenges, compared to the one-dimensional case. Separability of the covarian