**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Stratégies de compromis pour l'estimation des paramètres de régression et pour la classification floue

Résumé

Model specification is an integral part of any statistical inference problem. Several model selection techniques have been developed in order to determine which model is the best one among a list of possible candidates. Another way to deal with this question is the so-called model averaging, and in particular the frequentist approach. An estimation of the parameters of interest is obtained by constructing a weighted average of the estimates of these quantities under each candidate model. We develop compromise frequentist strategies for the estimation of regression parameters, as well as for the probabilistic clustering problem. In the regression context, we construct compromise strategies based on the Pitman estimators associated with various underlying errors distributions. The weight given to each model is equal to its profile likelihood, which gives a measure of the goodness-of-fit. Asymptotic properties of both Pitman estimators and profile likelihood allow us to define a minimax strategy for choosing the distributions of the compromise, involving a notion of distance between distributions. Performances of such estimators are then compared to other usual and robust procedures. In the second part of the thesis, we develop compromise strategies in the probabilistic clustering context. Although this clustering method is based on mixtures of distributions, our compromise strategies are not applied directly to the estimates of the parameters, but on the posterior probabilities of membership. Two types of compromise are presented, and the performances of resulting classification rules are investigated.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (27)

Estimateur (statistique)

En statistique, un estimateur est une fonction permettant d'estimer un moment d'une loi de probabilité (comme son espérance ou sa variance). Il peut par exemple servir à estimer certaines caractérist

Loi normale

En théorie des probabilités et en statistique, les lois normales sont parmi les lois de probabilité les plus utilisées pour modéliser des phénomènes naturels issus de plusieurs événements aléatoires.

Fonction de vraisemblance

vignette|Exemple d'une fonction de vraisemblance pour le paramètre d'une Loi de Poisson
En théorie des probabilités et en statistique, la fonction de vraisemblance (ou plus simplement vraisemblance) e

Publications associées (49)

Chargement

Chargement

Chargement

We introduce a fast approach to classification and clustering applicable to high-dimensional continuous data, based on Bayesian mixture models for which explicit computations are available. This permits us to treat classification and clustering in a single framework, and allows calculation of unobserved class probability. The new classifier is robust to adding noise variables as a drawback of the built-in spike-and-slab structure of the proposed Bayesian model. The usefulness of classification using our method is shown on metabololomic example, and on the Iris data with and without noise variables. Agglomerative hierarchical clustering is used to construct a dendrogram based on the posterior probabilities of particular partitions, to provide a dendrogram with a probabilistic interpretation. An extension to variable selection is proposed which summarises the importance of variables for classification or clustering and has probabilistic interpretation. Having a simple model provides estimation of the model parameters using maximum likelihood and therefore yields a fully automatic algorithm. The new clustering method is applied to metabolomic, microarray, and image data and is studied using simulated data motivated by real datasets. The computational difficulties of the new approach are discussed, solutions for algorithm acceleration are proposed, and the written computer code is briefly analysed. Simulations shows that the quality of the estimated model parameters depends on the parametric distribution assumed for effects, but after fixing the model parameters to reasonable values, the distribution of the effects influences clustering very little. Simulations confirms that the clustering algorithm and the proposed variable selection method is reliable when the model assumptions are wrong. The new approach is compared with the popular Bayesian clustering alternative, MCLUST, fitted on the principal components using two loss functions in which our proposed approach is found to be more efficient in almost every situation.

In this paper, we investigate the construction of compromise estimators of location and scale, by averaging over several models selected among a specified large set of possible models. The weight given to each distribution is based on the profile likelihood, which leads to a notion of distance between distributions as we study the asymptotic behaviour of such estimators. The selection of the models is made in a minimax way, in order to choose distributions that are close to any possible distribution. We also present simulation results of such compromise estimators based on contaminated Gaussian and Student's t distributions. (C) 2011 Elsevier B.V. All rights reserved.

Powerful mathematical tools have been developed for trading in stocks and bonds, but other markets that are equally important for the globalized world have to some extent been neglected. We decided to study the shipping market as an new area of development in mathematical finance. The market in shipping derivatives (FFA and FOSVA) has only been developed after 2000 and now exhibits impressive growth. Financial actors have entered the field, but it is still largely undiscovered by institutional investors. The first part of the work was to identify the characteristics of the market in shipping, i.e. the segmentation and the volatility. Because the shipping business is old-fashioned, even the leading actors on the world stage (ship owners and banks) are using macro-economic models to forecast the rates. If the macro-economic models are logical and make sense, they fail to predict. For example, the factor port congestion has been much cited during the last few years, but it is clearly very difficult to control and is simply an indicator of traffic. From our own experience it appears that most ship owners are in fact market driven and rather bad at anticipating trends. Due to their ability to capture large moves, we chose to consider Lévy processes for the underlying price process. Compared with the macro-economic approach, the main advantage is the uniform and systematic structure this imposed on the models. We get in each case a favorable result for our technology and a gain in forecasting accuracy of around 10% depending on the maturity. The global distribution is more effectively modelled and the tails of the distribution are particularly well represented. This model can be used to forecast the market but also to evaluate the risk, for example, by computing the VaR. An important limitation is the non-robustness in the estimation of the Lévy processes. The use of robust estimators reinforces the information obtained from the observed data. Because maximum likelihood estimation is not easy to compute with complex processes, we only consider some very general robust score functions to manage the technical problems. Two new class of robust estimators are suggested. These are based on the work of F. Hampel ([29]) and P. Huber ([30]) using influence functions. The main idea is to bound the maximum likelihood score function. By doing this a bias is created in the parameters estimation, which can be corrected by using a modification of the following type and as proposed by F. Hampel. The procedure for finding a robust estimating equation is thus decomposed into two consecutive steps : Subtract the bias correction and then Bound the score function. In the case of complex Lévy processes, the bias correction is difficult to compute and generally unknown. We have developed a pragmatic solution by inverting the Hampel's procedure. Bound the score function and then Correct for the bias. The price is a loss of the theoretical properties of our estimators, besides the procedure converges to maximum likelihood estimate. A second solution to for achieving robust estimation is presented. It considers the limiting case when the upper and lower bounds tend to zero and leads to B-robust estimators. Because of the complexity of the Lévy distributions, this leads to identification problems.