**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Contributions to Likelihood-Based Modelling of Extreme Values

Résumé

xtreme value analysis is concerned with the modelling of extreme events such as floods and heatwaves, which can have large impacts. Statistical modelling can be useful to better assess risks even if, due to scarcity of measurements, there is inherently very large residual uncertainty in any analysis. Driven by the increase in environmental databases, spatial modelling of extremes has expanded rapidly in the last decade. This thesis presents contributions to such analysis.

The first chapter is about likelihood-based inference in the univariate setting and investigates the use of bias-correction and higher-order asymptotic methods for extremes, highlighting through examples and illustrations the unique challenge posed by data scarcity. We focus on parametric modelling of extreme values, which relies on limiting distributional results and for which, as a result, uncertainty quantification is complicated. We find that, in certain cases, small-sample asymptotic methods can give improved inference by reducing the error rate of confidence intervals. Two data illustrations, linked to assessment of the frequency of extreme rainfall episodes in Venezuela and the analysis of survival of supercentenarians, illustrate the methods developed.

In the second chapter, we review the major methods for the analysis of spatial extremes models. We highlight the similarities and provide a thorough literature review along with novel simulation algorithms. The methods described therein are made available through a statistical software package.

The last chapter focuses on estimation for a Bayesian hierarchical model derived from a multivariate generalized Pareto process. We review approaches for the estimation of censored components in models derived from (log)-elliptical distributions, paying particular attention to the estimation of a high-dimensional Gaussian distribution function via Monte Carlo methods. The impacts of model misspecification and of censoring are explored through extensive simulations and we conclude with a case study of rainfall extremes in Eastern Switzerland.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (32)

Publications associées (97)

Modèle statistique

Un modèle statistique est une description mathématique approximative du mécanisme qui a généré les observations, que l'on suppose être un processus stochastique et non un processus déterministe. Il s

Intervalle de confiance

vignette|Chaque ligne montre 20 échantillons tirés selon la loi normale de moyenne μ. On y montre l'intervalle de confiance de niveau 50% pour la moyenne correspondante aux 20 échantillons, marquée pa

Méthode de Monte-Carlo

Une méthode de Monte-Carlo, ou méthode Monte-Carlo, est une méthode algorithmique visant à calculer une valeur numérique approchée en utilisant des procédés aléatoires, c'est-à-dire des techniques p

Chargement

Chargement

Chargement

This thesis is a contribution to financial statistics. One of the principal concerns of investors is the evaluation of portfolio risk. The notion of risk is vague, but in finance it is always linked to possible losses. In this thesis, we present some measures allowing the valuation of risk with the help of Bayesian methods. An exploratory analysis of data is presented to describe the sampling properties of financial time series. This analysis allows us to understand the origins of the daily returns studied in this thesis. Moreover, a discussion of different models is presented. These models make strong assumptions on investor behaviour, which are not always satisfied. This exploratory analysis shows some differences between the behaviour anticipated under equilibrium models, and that of real data. The Bayesian approach has been chosen because it allows one to incorporate all the variability, in particular that associated with model choice. The models studied in this thesis allow one to take heteroskedasticity into account, as well as particular shapes of the tails of returns. ARCH type models and models based on extreme value theory are studied. One original aspect of this thesis is its use of Bayesian analysis to detect change points in financial time series. We suppose that a market has two phases, and that it switches from a state to the other at random. Another new contribution is a model integrating heteroskedasticity and time dependence of extreme values, by superposition of the model proposed by Bortot and Coles (2003) and a GARCH process. This thesis uses simulation intensively for the estimation of risk measures. The drawback of simulation is the amount of time needed to obtain accurate estimates. However, simulation allows one to produce results when direct calculation is not feasible. For example, simulation allows one to compute risk estimates for time horizons greater than one day. The methods presented in this thesis are illustrated on simulated data, and on real data from European and American markets. This thesis involved the construction of a library containing C and S code to perform risk analysis using GARCH and extreme value theory models. The results show that model uncertainty can be incorporated, and that risk measures for time horizons greater than one can be obtained by simulation. The methods presented in this thesis have a natural representation involving conditioning. Thus, they permit the computation of both conditional and unconditional risk estimates. Three methods are described: the GARCH method; the two-state GARCH method; and the HBC method. Unconditional risk estimation using the GARCH method is satisfactory on data which seem stationary, but not reliable on data which are non-stationary, such as data with change points. The two-state GARCH model does a little better, but gives very satisfactory results when the risk is estimated conditionally on time. The HBC method does not give satisfactory results.

Extreme events are responsible for huge material damage and are costly in terms of their human and economic impacts. They strike all facets of modern society, such as physical infrastructure and insurance companies through environmental hazards, banking and finance through stock market crises, and the internet and communication systems through network and server overloads. It is thus of increasing importance to accurately assess the risk of extreme events in order to mitigate them. Extreme value theory is a statistical approach to extrapolation of probabilities beyond the range of the data, which provides a robust framework to learn from an often small number of recorded extreme events.
In this thesis, we consider a conditional approach to modelling extreme values that is more flexible than standard models for simultaneously extreme events. We explore the subasymptotic properties of this conditional approach and prove that in specific situations its finite-sample behaviour can differ significantly from its limit characterisation.
For modelling extremes in time series with short-range dependence, the standard peaks-over-threshold method relies on a pre-processing step that retains only a subset of observations exceeding a high threshold and can result in badly-biased estimates. This method focuses on the marginal distribution of the extremes and does not estimate temporal extremal dependence.
We propose a new methodology to model time series extremes using Bayesian semiparametrics and allowing estimation of functionals of clusters of extremes.
We apply our methodology to model river flow data in England and improve flood risk assessment by explicitly describing extremal dependence in time, using information from all exceedances of a high threshold.
We develop two new bivariate models which are based on the conditional tail approach, and use all observations having at least one extreme component in our inference procedure, thus extracting more information from the data than existing approaches. We compare the efficiency of these models in a simulation study and discuss generalisations to higher-dimensional setups.
Existing models for extremes of Markov chains generally rely on a strong assumption of asymptotic dependence at all lags and separately consider marginal and joint features. We introduce a more flexible model and show how Bayesian semiparametrics can provide a suitable framework allowing simultaneous inference for the margins and the extremal dependence structure, yielding efficient risk estimates and a reliable assessment of uncertainty.

The thesis is a contribution to extreme-value statistics, more precisely to the estimation of clustering characteristics of extreme values. One summary measure of the tendency to form groups is the inverse average cluster size. In extreme-value context, this parameter is called the extremal index, and apart from its relation with the size of groups, it appears as an important parameter measuring the effects of serial dependence on extreme levels in time series. Although several methods exist for its estimation in univariate sequences, these methods are only applicable for strictly stationary series satisfying a long-range asymptotic independence condition on extreme levels, cannot take covariates into consideration, and yield only crude estimates for the corresponding multivariate quantity. These are strong restrictions and great drawbacks. In climatic time series, both stationarity and asymptotic independence can be broken, due to climate change and possible long memory of the data, and not including information from simultaneously measured linked variables may lead to inefficient estimation. The thesis addresses these issues. First, we extend the theorem of Ferro and Segers (2003) concerning the distribution of inter-exceedance times: we introduce truncated inter-exceedance times, called K-gaps, and show that they follow the same exponential-point mass mixture distribution as the inter-exceedance times. The maximization of the likelihood built on this distribution yields a simple closed-form estimator for the extremal index. The method can admit covariates and can be applied with smoothing techniques, which allows its use in a nonstationary setting. Simulated and real data examples demonstrate the smooth estimation of the extremal index. The likelihood, based on an assumption of independence of the K-gaps, is misspecified whenever K is too small. This motivates another contribution of the thesis, the introduction into extreme-value statistics of misspecification tests based on the information matrix. For our likelihood, they are able to detect misspecification from any source, not only those due to a bad choice of the truncation parameter. They provide help also in threshold selection, and show whether the fundamental assumptions of stationarity or asymptotic independence are broken. Moreover, these diagnostic tests are of general use, and could be adapted to many kinds of extreme-value models, which are always approximate. Simulated examples demonstrate the performance of the misspecification tests in the context of extremal index estimation. Two data examples with complex behaviour, one univariate and the other bivariate, offer insight into their power in discovering situations where the fundamental assumptions of the likelihood model are not valid. In the multivariate case, the parameter corresponding to the univariate extremal index is the multivariate extremal index function. As in the univariate case, its appearance is linked to serial dependence in the observed processes. Univariate estimation methods can be applied, but are likely to give crude, unreasonably varying, estimates, and the constraints on the extremal index function implied by the characteristics of the stable tail dependence function are not automatically satisfied. The third contribution of the thesis is the development of methodology based on the M4 approximation of Smith and Weissman (1996), which can be used to estimate the multivariate extremal index, as well as other cluster characteristics. For this purpose, we give a preliminary cluster selection procedure, and approximate the noise on finite levels with a flexible semiparametric model, the Dirichlet mixtures used widely in Bayesian analysis. The model is fitted by the EM algorithm. Advantages and drawbacks of the method are discussed using the same univariate and bivariate examples as the likelihood methods.