**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Test statistic

Summary

A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing. A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.
An important property of a test statistic is that its sampling distribution under the null hypothesis must be calculable, either exactly or approximately, which allows p-values to be calculated. A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. Ho

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related publications (19)

Related people (3)

Loading

Loading

Loading

Related units (3)

Related concepts (23)

Normal distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Statistical hypothesis testing

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabil

Related courses (24)

MATH-234(b): Probability and statistics

Le cours présente les notions de base de la théorie des probabilités et de l'inférence statistique. L'accent est mis sur les concepts principaux ainsi que les méthodes les plus utilisées.

MGT-581: Introduction to econometrics

The course provides an introduction to econometrics. The objective is to learn how to make valid (i.e., causal) inference from economic data. It explains the main estimators and present methods to deal with endogeneity issues.

FIN-403: Econometrics

The course covers basic econometric models and methods that are routinely applied to obtain inference results in economic and financial applications.

Related lectures (95)

Covariance operators play a fundamental role in functional data analysis, providing the canonical means to analyse functional variation via the celebrated Karhunen-Loève expansion. These operators may themselves be subject to variation, for instance in contexts where multiple functional populations are to be compared. Statistical techniques to analyse such variation are intimately linked with the choice of metric on the space of such operators, as well as with their intrinsic infinite-dimensionality.
We will show that we can identify the space of infinite-dimensional covariance operators equipped with the Procrustes size-and-shape metric from shape theory, with that of centred Gaussian processes, equipped with the Wasserstein metric of optimal transportation. We then describe key geometrical and topological aspects of the space of covariance operators endowed with the Procrustes metric. Through the notion of multicoupling of Gaussian measures, we establish existence, uniqueness and stability for the Fréchet mean of covariance operators with respect to the Procrustes metric. Furthermore, we will provide generative models that are canonical for such metric.
We then turn to the problem of comparing several samples of stochastic processes with respect to their second-order structure, and we subsequently describe the main modes of variation in this second order structure. These two tasks are carried out via an Analysis of Variance (ANOVA) and a Principal Component Analysis (PCA) of covariance operators respectively. In order to perform ANOVA, we introduce a novel approach based on optimal (multi)transport and identify each covariance with an optimal transport map. These maps are then contrasted with the identity with respect to a norm-induced distance. The resulting test statistic, calibrated by permutation, outperforms the state-of-the-art in the functional case. If the null hypothesis postulating equality of the operators is rejected, thanks to a geometric interpretation of the transport maps we can construct a PCA on the tangent space with the aim of understanding the sample variability. Finally, we provide a further example of use of the optimal transport framework, by applying it to the problem of clustering of operators. Two different clustering algorithms are presented, one of which is innovative. The transportation ANOVA, PCA and clustering are validated both on simulated scenarios and real dataset.

Omar Garcia Crespillo, Michael Meurer, Jan Skaloud

We perform a comparison between an innovation and a residual based integrity monitoring in Extended Kalman Filters for the GNSS/INS hybridization. In this paper, we restrict the study to the detection of abrupt snapshot faults in order to get an intuitive insight. We study the differences in the distribution of the test statistics and the computation of the thresholds. We derive and compare the minimum detectable bias and we provide expressions for the protection levels for both approaches in the single fault situation. We compare them with the classical residual based GNSS RAIM algorithm. Additionally, we perform a sensitivity analysis of the integrity relevant parameters to the inertial sensor quality.

2017In multiple testing problems where the components come from a mixture model of noise and true effect, we seek to first test for the existence of the non-zero components, and then identify the true alternatives under a fixed significance level $\alpha$. Two parameters, namely the fraction of the non-null components $\varepsilon$ and the size of the effects $\mu$, characterise the two-point mixture model under the global alternative. When the number of hypotheses $m$ goes to infinity, we are interested in an asymptotic framework where the fraction of the non-null components is vanishing, and the true effects need to be sizable to be detected. Donoho and Jin give an explicit form of the asymptotic detectable boundary based on the Gaussian mixture model under the classic calibration of the parameters of the mixture model. We prove the analogous results for the Cauchy mixture distribution as an example heavy-tailed case. This requires a different formulation of the parameters, which reflects the added difficulties.
We also propose a multiple testing procedure based on a filtering approach that can discover the true alternatives.
Benjamini and Hochberg (BH) compare the observed $p$-values to a linear threshold curve and reject the null hypotheses from the minimum up to the last up-crossing, and prove the false discovery rate (FDR) is controlled.
However, there is an intrinsic difference in heavy-tailed settings. Were we to use the BH procedure we would get a highly variable positive false discovery rate (pFDR). In our study we analyse the distribution of the $p$-values and devise a new multiple testing procedure to combine the usual case and the heavy-tailed case based on the empirical properties of the $p$-values. The filtering approach is designed to eliminate most $p$-values that are more likely to be uniform, while preserving most of the true alternatives. Based on the filtered $p$-values, we estimate the mode $\vartheta$ and define the rejection region $\mathscr{R}(\vartheta, \delta)=\left[ \vartheta -\delta/2, \vartheta +\delta/2 \right]$ such that the most informative $p$-values are included. The length $\delta$ is chosen by controlling the data-dependent estimation of FDR at a desired level.