Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Model specification is an integral part of any statistical inference problem. Several model selection techniques have been developed in order to determine which model is the best one among a list of possible candidates. Another way to deal with this question is the so-called model averaging, and in particular the frequentist approach. An estimation of the parameters of interest is obtained by constructing a weighted average of the estimates of these quantities under each candidate model. We develop compromise frequentist strategies for the estimation of regression parameters, as well as for the probabilistic clustering problem. In the regression context, we construct compromise strategies based on the Pitman estimators associated with various underlying errors distributions. The weight given to each model is equal to its profile likelihood, which gives a measure of the goodness-of-fit. Asymptotic properties of both Pitman estimators and profile likelihood allow us to define a minimax strategy for choosing the distributions of the compromise, involving a notion of distance between distributions. Performances of such estimators are then compared to other usual and robust procedures. In the second part of the thesis, we develop compromise strategies in the probabilistic clustering context. Although this clustering method is based on mixtures of distributions, our compromise strategies are not applied directly to the estimates of the parameters, but on the posterior probabilities of membership. Two types of compromise are presented, and the performances of resulting classification rules are investigated.
Florent Gérard Krzakala, Lenka Zdeborová, Hugo Chao Cui