**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Mixture models for multivariate extremes

Abstract

This thesis is a contribution to multivariate extreme value statistics. The tail of a multivariate distribution function is characterized by its spectral distribution, for which we propose a new semi-parametric model based on mixtures of Dirichlet distributions. To estimate the components of this model, reversible jump Monte Carlo Markov chain and EM algorithms are developed. Their performances are illustrated on real and simulated data, obtained using new representations of the extremal logistic and Dirichlet models. In parallel with the estimation of the spectral distribution, extreme value statistic machinery requires the selection of a threshold in order to classify data as extreme or not. This selection is achieved by a new method based on heuristic arguments. It allows a selection independent of the dimension of the data. Its performance is illustrated on real and simulated data. Primal scientific interests behind a multivariate extreme value analysis reside in the estimation of quantiles of rare events and in the exploration of the dependence structure, for which the estimation of the spectral measure is a means rather than an end. These two issues are addressed. For the first, a Monte Carlo method is developed based on simulation of extremes. It is compared with classical and new methods of the literature. For the second one, an original conditional dependence analysis is proposed, which enlightens various aspects of the dependence structure of the data. Examples using real data sets are given. In the last part, the semi-parametric model and the presented methods are extended to spatial extremes. It is made possible by considering the spectral distribution as the distribution of a random probability, an original viewpoint adopted throughout this thesis. Classical multivariate extremes are extended to extremes of random measures. The application is illustrated on rainfall data in China.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (43)

Loading

Loading

Loading

Related concepts (25)

Multivariate normal distribution

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) no

Normal distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

Mixture model

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the s

We introduce a fast approach to classification and clustering applicable to high-dimensional continuous data, based on Bayesian mixture models for which explicit computations are available. This permits us to treat classification and clustering in a single framework, and allows calculation of unobserved class probability. The new classifier is robust to adding noise variables as a drawback of the built-in spike-and-slab structure of the proposed Bayesian model. The usefulness of classification using our method is shown on metabololomic example, and on the Iris data with and without noise variables. Agglomerative hierarchical clustering is used to construct a dendrogram based on the posterior probabilities of particular partitions, to provide a dendrogram with a probabilistic interpretation. An extension to variable selection is proposed which summarises the importance of variables for classification or clustering and has probabilistic interpretation. Having a simple model provides estimation of the model parameters using maximum likelihood and therefore yields a fully automatic algorithm. The new clustering method is applied to metabolomic, microarray, and image data and is studied using simulated data motivated by real datasets. The computational difficulties of the new approach are discussed, solutions for algorithm acceleration are proposed, and the written computer code is briefly analysed. Simulations shows that the quality of the estimated model parameters depends on the parametric distribution assumed for effects, but after fixing the model parameters to reasonable values, the distribution of the effects influences clustering very little. Simulations confirms that the clustering algorithm and the proposed variable selection method is reliable when the model assumptions are wrong. The new approach is compared with the popular Bayesian clustering alternative, MCLUST, fitted on the principal components using two loss functions in which our proposed approach is found to be more efficient in almost every situation.

The increasing interest in using statistical extreme value theory to analyse environmental data is mainly driven by the large impact extreme events can have. A difficulty with spatial data is that most existing inference methods for asymptotically justified models for extremes are computationally intractable for data at several hundreds of sites, a number easily attained or surpassed by the output of physical climate models or satellite-based data sets. This thesis does not directly tackle this problem, but it provides some elements that might be useful in doing so. The first part of the thesis contains a pointwise marginal analysis of satellite-based measurements of total column ozone in the northern and southern mid-latitudes. At each grid cell, the r-largest order statistics method is used to analyse extremely low and high values of total ozone, and an autoregressive moving average time series model is used for an analogous analysis of mean values. Both models include the same set of global covariates describing the dynamical and chemical state of the atmosphere. The results show that influence of the covariates is captured in both the ``bulk'' and the tails of the statistical distribution of ozone. For some covariates, our results are in good agreement with findings of earlier studies, whereas unprecedented influences are retrieved for two dynamical covariates. The second part concerns the frameworks of multivariate and spatial modelling of extremes. We review one class of multivariate extreme value distributions, the so-called Hüsler--Reiss model, as well as its spatial extension, the Brown--Resnick process. For the former, we provide a detailed discussion of its parameter matrix, including the case of degeneracy, which arises if the correlation matrices of underlying multivariate Gaussian distributions are singular. We establish a simplification for computing the partial derivatives of the exponent function of these two models. As consequence of the considerably reduced number of terms in each partial derivative, computation time for the multivariate joint density of these models can be reduced, which could be helpful for (composite) likelihood inference. Finally, we propose a new variant of the Brown--Resnick process based on the Karhunen--Loève expansion of its underlying Gaussian process. As an illustration, we use composite likelihood to fit a simplified version of our model to a hindcast data set of wave heights that shows highly dependent extremes.

Marc-Olivier Boldi, Anthony Christopher Davison

The spectral density function plays a key role in ﬁtting the tail of multivariate extremal data and so in estimating probabilities of rare events. This function satisﬁes moment constraints but unlike the univariate extreme value distributions has no simple parametric form. Parameterized subfamilies of spectral densities have been suggested for use in applications, and nonparametric estimation procedures have been proposed, but semiparametric models for multivariate extremes have hitherto received little attention. We show that mixtures of Dirichlet distributions satisfying the moment constraints are weakly dense in the class of all nonparametric spectral densities, and discuss frequentist and Bayesian inference in this class based on the EM algorithm and reversible jump Markov chain Monte Carlo simulation. We illustrate the ideas using simulated and real data.

2007