Multiple testing with test statistics following heavy-tailed distributions

Zhiwen Jiang
2021
Thèse EPFL

Résumé

In multiple testing problems where the components come from a mixture model of noise and true effect, we seek to first test for the existence of the non-zero components, and then identify the true alternatives under a fixed significance level $\alpha$ . Two parameters, namely the fraction of the non-null components $\varepsilon$ and the size of the effects $\mu$ , characterise the two-point mixture model under the global alternative. When the number of hypotheses $m$ goes to infinity, we are interested in an asymptotic framework where the fraction of the non-null components is vanishing, and the true effects need to be sizable to be detected. Donoho and Jin give an explicit form of the asymptotic detectable boundary based on the Gaussian mixture model under the classic calibration of the parameters of the mixture model. We prove the analogous results for the Cauchy mixture distribution as an example heavy-tailed case. This requires a different formulation of the parameters, which reflects the added difficulties.

We also propose a multiple testing procedure based on a filtering approach that can discover the true alternatives. Benjamini and Hochberg (BH) compare the observed $p$ -values to a linear threshold curve and reject the null hypotheses from the minimum up to the last up-crossing, and prove the false discovery rate (FDR) is controlled. However, there is an intrinsic difference in heavy-tailed settings. Were we to use the BH procedure we would get a highly variable positive false discovery rate (pFDR). In our study we analyse the distribution of the $p$ -values and devise a new multiple testing procedure to combine the usual case and the heavy-tailed case based on the empirical properties of the $p$ -values. The filtering approach is designed to eliminate most $p$ -values that are more likely to be uniform, while preserving most of the true alternatives. Based on the filtered $p$ -values, we estimate the mode $\vartheta$ and define the rejection region $\mathscr{R}(\vartheta, \delta)=\left[ \vartheta -\delta/2, \vartheta +\delta/2 \right]$ such that the most informative $p$ -values are included. The length $\delta$ is chosen by controlling the data-dependent estimation of FDR at a desired level.

Source officielle

https://infoscience.epfl.ch/record/283692?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Multiple testing with test statistics following heavy-tailed distributions

Graph Chatbot

Chattez avec Graph Search

OptTTA: Learnable Test-Time Augmentation for Source-Free Medical Image Segmentation Under Domain Shift

To regenerate or not to regenerate: Vertebrate model organisms of regeneration-competency and -incompetency

Interpreting null models of resting-state functional MRI dynamics: not throwing the model out with the hypothesis

To regenerate or not to regenerate: Vertebrate model organisms of regeneration-competency and -incompetency

Interpreting null models of resting-state functional MRI dynamics: not throwing the model out with the hypothesis

OptTTA: Learnable Test-Time Augmentation for Source-Free Medical Image Segmentation Under Domain Shift