**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Contributions to Modelling Extremes of Spatial Data

Résumé

The increasing interest in using statistical extreme value theory to analyse environmental data is mainly driven by the large impact extreme events can have. A difficulty with spatial data is that most existing inference methods for asymptotically justified models for extremes are computationally intractable for data at several hundreds of sites, a number easily attained or surpassed by the output of physical climate models or satellite-based data sets. This thesis does not directly tackle this problem, but it provides some elements that might be useful in doing so. The first part of the thesis contains a pointwise marginal analysis of satellite-based measurements of total column ozone in the northern and southern mid-latitudes. At each grid cell, the r-largest order statistics method is used to analyse extremely low and high values of total ozone, and an autoregressive moving average time series model is used for an analogous analysis of mean values. Both models include the same set of global covariates describing the dynamical and chemical state of the atmosphere. The results show that influence of the covariates is captured in both the ``bulk'' and the tails of the statistical distribution of ozone. For some covariates, our results are in good agreement with findings of earlier studies, whereas unprecedented influences are retrieved for two dynamical covariates. The second part concerns the frameworks of multivariate and spatial modelling of extremes. We review one class of multivariate extreme value distributions, the so-called Hüsler--Reiss model, as well as its spatial extension, the Brown--Resnick process. For the former, we provide a detailed discussion of its parameter matrix, including the case of degeneracy, which arises if the correlation matrices of underlying multivariate Gaussian distributions are singular. We establish a simplification for computing the partial derivatives of the exponent function of these two models. As consequence of the considerably reduced number of terms in each partial derivative, computation time for the multivariate joint density of these models can be reduced, which could be helpful for (composite) likelihood inference. Finally, we propose a new variant of the Brown--Resnick process based on the Karhunen--Loève expansion of its underlying Gaussian process. As an illustration, we use composite likelihood to fit a simplified version of our model to a hindcast data set of wave heights that shows highly dependent extremes.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (32)

Système dynamique

En mathématiques, en chimie ou en physique, un système dynamique est la donnée d’un système et d’une loi décrivant l'évolution de ce système. Ce peut être l'évolution d'une réaction chimique au cour

Fonction de répartition empirique

En statistiques, une fonction de répartition empirique est une fonction de répartition qui attribue la probabilité 1/n à chacun des n nombres dans un échantillon.
Soit X,...,X un échantillon de varia

Statistique

La statistique est la discipline qui étudie des phénomènes à travers la collecte de données, leur traitement, leur analyse, l'interprétation des résultats et leur présentation afin de rendre ces don

Publications associées (62)

Chargement

Chargement

Chargement

This thesis is a contribution to multivariate extreme value statistics. The tail of a multivariate distribution function is characterized by its spectral distribution, for which we propose a new semi-parametric model based on mixtures of Dirichlet distributions. To estimate the components of this model, reversible jump Monte Carlo Markov chain and EM algorithms are developed. Their performances are illustrated on real and simulated data, obtained using new representations of the extremal logistic and Dirichlet models. In parallel with the estimation of the spectral distribution, extreme value statistic machinery requires the selection of a threshold in order to classify data as extreme or not. This selection is achieved by a new method based on heuristic arguments. It allows a selection independent of the dimension of the data. Its performance is illustrated on real and simulated data. Primal scientific interests behind a multivariate extreme value analysis reside in the estimation of quantiles of rare events and in the exploration of the dependence structure, for which the estimation of the spectral measure is a means rather than an end. These two issues are addressed. For the first, a Monte Carlo method is developed based on simulation of extremes. It is compared with classical and new methods of the literature. For the second one, an original conditional dependence analysis is proposed, which enlightens various aspects of the dependence structure of the data. Examples using real data sets are given. In the last part, the semi-parametric model and the presented methods are extended to spatial extremes. It is made possible by considering the spectral distribution as the distribution of a random probability, an original viewpoint adopted throughout this thesis. Classical multivariate extremes are extended to extremes of random measures. The application is illustrated on rainfall data in China.

Olfactometer experiments are used to determine the effect of odours on the behaviour of organisms such as insects or nematodes, and typically result in data comprising many groups of small counts, overdispersed relative to the multinomial distribution. Overdispersion reflects a lack of independence or heterogeneity among individuals and can lead to statistics having larger variances than expected and possible losses of efficiency. In this thesis, some distributions which consist of generalisations of the multinomial distribution have been developed. These models are based on non-homogeneous Markov chain theory, take the overdispersion into account, and potentially provide a physical interpretation of the overdispersion seen in olfactometer data. Some inference aspects are considered, including comparison of the asymptotic relative efficiencies of three different sampling schemes. The fact that the empirical distributions well approximate the corresponding asymptotic distributions is checked. Observable differences in parameter estimates between data generated under different hypotheses are also studied. Finally, different models intended to shed light on various aspects of the data and/or the experiment procedure, are applied to three real olfactometer datasets.

The thesis is a contribution to extreme-value statistics, more precisely to the estimation of clustering characteristics of extreme values. One summary measure of the tendency to form groups is the inverse average cluster size. In extreme-value context, this parameter is called the extremal index, and apart from its relation with the size of groups, it appears as an important parameter measuring the effects of serial dependence on extreme levels in time series. Although several methods exist for its estimation in univariate sequences, these methods are only applicable for strictly stationary series satisfying a long-range asymptotic independence condition on extreme levels, cannot take covariates into consideration, and yield only crude estimates for the corresponding multivariate quantity. These are strong restrictions and great drawbacks. In climatic time series, both stationarity and asymptotic independence can be broken, due to climate change and possible long memory of the data, and not including information from simultaneously measured linked variables may lead to inefficient estimation. The thesis addresses these issues. First, we extend the theorem of Ferro and Segers (2003) concerning the distribution of inter-exceedance times: we introduce truncated inter-exceedance times, called K-gaps, and show that they follow the same exponential-point mass mixture distribution as the inter-exceedance times. The maximization of the likelihood built on this distribution yields a simple closed-form estimator for the extremal index. The method can admit covariates and can be applied with smoothing techniques, which allows its use in a nonstationary setting. Simulated and real data examples demonstrate the smooth estimation of the extremal index. The likelihood, based on an assumption of independence of the K-gaps, is misspecified whenever K is too small. This motivates another contribution of the thesis, the introduction into extreme-value statistics of misspecification tests based on the information matrix. For our likelihood, they are able to detect misspecification from any source, not only those due to a bad choice of the truncation parameter. They provide help also in threshold selection, and show whether the fundamental assumptions of stationarity or asymptotic independence are broken. Moreover, these diagnostic tests are of general use, and could be adapted to many kinds of extreme-value models, which are always approximate. Simulated examples demonstrate the performance of the misspecification tests in the context of extremal index estimation. Two data examples with complex behaviour, one univariate and the other bivariate, offer insight into their power in discovering situations where the fundamental assumptions of the likelihood model are not valid. In the multivariate case, the parameter corresponding to the univariate extremal index is the multivariate extremal index function. As in the univariate case, its appearance is linked to serial dependence in the observed processes. Univariate estimation methods can be applied, but are likely to give crude, unreasonably varying, estimates, and the constraints on the extremal index function implied by the characteristics of the stable tail dependence function are not automatically satisfied. The third contribution of the thesis is the development of methodology based on the M4 approximation of Smith and Weissman (1996), which can be used to estimate the multivariate extremal index, as well as other cluster characteristics. For this purpose, we give a preliminary cluster selection procedure, and approximate the noise on finite levels with a flexible semiparametric model, the Dirichlet mixtures used widely in Bayesian analysis. The model is fitted by the EM algorithm. Advantages and drawbacks of the method are discussed using the same univariate and bivariate examples as the likelihood methods.