Daria Rukina
In this thesis, we deal with one of the facets of the statistical detection problem. We study a particular type of alternative, the mixture model. We consider testing where the null hypothesis corresponds to the absence of a signal, represented by some known distribution, e.g., Gaussian white noise, while in the alternative one assumes that among observations there might be a cluster of points carrying a signal, which is characterized by some distribution G. The main research objective is to determine a detectable set of alternatives in the parameter space combining the parameters of G and the mixture proportion p. We focus is on the finite sample sizes and wish to study the possibility of detecting alternatives for fixed n, given pre-specified error levels.
The first part of the thesis covers theoretical results. Specifically, we introduce a parametrization which relates the parameter space of an alternative to the sample size, and present the regions of detectability and non-detectability. The regions of detectability are the subsets of the new parameter space (induced by the parametrization) where for a prespecified type I error rate, the type II error rate of the likelihood ratio test (LRT) is bounded from above by some constant. To move towards the real data applications, we also check the performance of some non-parametric testing procedures proposed for this problem and some widely used distributions.
In the second part of the thesis, we use this argument to develop a framework for clinical trial designs aimed at detecting a sensitive-to-therapy subpopulation. The idea of modeling treatment response as a mixture of subpopulations originates from treatment effect heterogeneity. Methods studying the effects of heterogeneity in the clinical data are referred to as subgroup analyses. However, designs accounting for possible response heterogeneity are rarely discussed, though in some cases they might help to avoid trial failure due to the lack of efficacy. In our work, we consider two possible subgroups of patients, drug responders and drug non-responders. Given no preliminary information about patients' memberships, we propose a framework for designing randomized clinical trials that are able to detect a responders' subgroup of desired characteristics. We also propose strategies to minimize the number of enrolled patients whilst preserving the testing errors below given levels and suggest how the design along with all testing metrics can be generalized to the case of multiple centers.
The last part of the thesis is not directly related to the preceding parts. We present two supervised classification algorithms for real-data applications.
EPFL2018