**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Statistical analysis of clusters of extreme events

Abstract

The thesis is a contribution to extreme-value statistics, more precisely to the estimation of clustering characteristics of extreme values. One summary measure of the tendency to form groups is the inverse average cluster size. In extreme-value context, this parameter is called the extremal index, and apart from its relation with the size of groups, it appears as an important parameter measuring the effects of serial dependence on extreme levels in time series. Although several methods exist for its estimation in univariate sequences, these methods are only applicable for strictly stationary series satisfying a long-range asymptotic independence condition on extreme levels, cannot take covariates into consideration, and yield only crude estimates for the corresponding multivariate quantity. These are strong restrictions and great drawbacks. In climatic time series, both stationarity and asymptotic independence can be broken, due to climate change and possible long memory of the data, and not including information from simultaneously measured linked variables may lead to inefficient estimation. The thesis addresses these issues. First, we extend the theorem of Ferro and Segers (2003) concerning the distribution of inter-exceedance times: we introduce truncated inter-exceedance times, called K-gaps, and show that they follow the same exponential-point mass mixture distribution as the inter-exceedance times. The maximization of the likelihood built on this distribution yields a simple closed-form estimator for the extremal index. The method can admit covariates and can be applied with smoothing techniques, which allows its use in a nonstationary setting. Simulated and real data examples demonstrate the smooth estimation of the extremal index. The likelihood, based on an assumption of independence of the K-gaps, is misspecified whenever K is too small. This motivates another contribution of the thesis, the introduction into extreme-value statistics of misspecification tests based on the information matrix. For our likelihood, they are able to detect misspecification from any source, not only those due to a bad choice of the truncation parameter. They provide help also in threshold selection, and show whether the fundamental assumptions of stationarity or asymptotic independence are broken. Moreover, these diagnostic tests are of general use, and could be adapted to many kinds of extreme-value models, which are always approximate. Simulated examples demonstrate the performance of the misspecification tests in the context of extremal index estimation. Two data examples with complex behaviour, one univariate and the other bivariate, offer insight into their power in discovering situations where the fundamental assumptions of the likelihood model are not valid. In the multivariate case, the parameter corresponding to the univariate extremal index is the multivariate extremal index function. As in the univariate case, its appearance is linked to serial dependence in the observed processes. Univariate estimation methods can be applied, but are likely to give crude, unreasonably varying, estimates, and the constraints on the extremal index function implied by the characteristics of the stable tail dependence function are not automatically satisfied. The third contribution of the thesis is the development of methodology based on the M4 approximation of Smith and Weissman (1996), which can be used to estimate the multivariate extremal index, as well as other cluster characteristics. For this purpose, we give a preliminary cluster selection procedure, and approximate the noise on finite levels with a flexible semiparametric model, the Dirichlet mixtures used widely in Bayesian analysis. The model is fitted by the EM algorithm. Advantages and drawbacks of the method are discussed using the same univariate and bivariate examples as the likelihood methods.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (42)

Likelihood function

In statistical inference, the likelihood function quantifies the plausibility of parameter values characterizing a statistical model in light of observed data. Its most typical usage is to compare po

Estimation

Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The v

Simulation

A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the se

Related publications (115)

Loading

Loading

Loading

Powerful mathematical tools have been developed for trading in stocks and bonds, but other markets that are equally important for the globalized world have to some extent been neglected. We decided to study the shipping market as an new area of development in mathematical finance. The market in shipping derivatives (FFA and FOSVA) has only been developed after 2000 and now exhibits impressive growth. Financial actors have entered the field, but it is still largely undiscovered by institutional investors. The first part of the work was to identify the characteristics of the market in shipping, i.e. the segmentation and the volatility. Because the shipping business is old-fashioned, even the leading actors on the world stage (ship owners and banks) are using macro-economic models to forecast the rates. If the macro-economic models are logical and make sense, they fail to predict. For example, the factor port congestion has been much cited during the last few years, but it is clearly very difficult to control and is simply an indicator of traffic. From our own experience it appears that most ship owners are in fact market driven and rather bad at anticipating trends. Due to their ability to capture large moves, we chose to consider Lévy processes for the underlying price process. Compared with the macro-economic approach, the main advantage is the uniform and systematic structure this imposed on the models. We get in each case a favorable result for our technology and a gain in forecasting accuracy of around 10% depending on the maturity. The global distribution is more effectively modelled and the tails of the distribution are particularly well represented. This model can be used to forecast the market but also to evaluate the risk, for example, by computing the VaR. An important limitation is the non-robustness in the estimation of the Lévy processes. The use of robust estimators reinforces the information obtained from the observed data. Because maximum likelihood estimation is not easy to compute with complex processes, we only consider some very general robust score functions to manage the technical problems. Two new class of robust estimators are suggested. These are based on the work of F. Hampel ([29]) and P. Huber ([30]) using influence functions. The main idea is to bound the maximum likelihood score function. By doing this a bias is created in the parameters estimation, which can be corrected by using a modification of the following type and as proposed by F. Hampel. The procedure for finding a robust estimating equation is thus decomposed into two consecutive steps : Subtract the bias correction and then Bound the score function. In the case of complex Lévy processes, the bias correction is difficult to compute and generally unknown. We have developed a pragmatic solution by inverting the Hampel's procedure. Bound the score function and then Correct for the bias. The price is a loss of the theoretical properties of our estimators, besides the procedure converges to maximum likelihood estimate. A second solution to for achieving robust estimation is presented. It considers the limiting case when the upper and lower bounds tend to zero and leads to B-robust estimators. Because of the complexity of the Lévy distributions, this leads to identification problems.

This thesis is a contribution to financial statistics. One of the principal concerns of investors is the evaluation of portfolio risk. The notion of risk is vague, but in finance it is always linked to possible losses. In this thesis, we present some measures allowing the valuation of risk with the help of Bayesian methods. An exploratory analysis of data is presented to describe the sampling properties of financial time series. This analysis allows us to understand the origins of the daily returns studied in this thesis. Moreover, a discussion of different models is presented. These models make strong assumptions on investor behaviour, which are not always satisfied. This exploratory analysis shows some differences between the behaviour anticipated under equilibrium models, and that of real data. The Bayesian approach has been chosen because it allows one to incorporate all the variability, in particular that associated with model choice. The models studied in this thesis allow one to take heteroskedasticity into account, as well as particular shapes of the tails of returns. ARCH type models and models based on extreme value theory are studied. One original aspect of this thesis is its use of Bayesian analysis to detect change points in financial time series. We suppose that a market has two phases, and that it switches from a state to the other at random. Another new contribution is a model integrating heteroskedasticity and time dependence of extreme values, by superposition of the model proposed by Bortot and Coles (2003) and a GARCH process. This thesis uses simulation intensively for the estimation of risk measures. The drawback of simulation is the amount of time needed to obtain accurate estimates. However, simulation allows one to produce results when direct calculation is not feasible. For example, simulation allows one to compute risk estimates for time horizons greater than one day. The methods presented in this thesis are illustrated on simulated data, and on real data from European and American markets. This thesis involved the construction of a library containing C and S code to perform risk analysis using GARCH and extreme value theory models. The results show that model uncertainty can be incorporated, and that risk measures for time horizons greater than one can be obtained by simulation. The methods presented in this thesis have a natural representation involving conditioning. Thus, they permit the computation of both conditional and unconditional risk estimates. Three methods are described: the GARCH method; the two-state GARCH method; and the HBC method. Unconditional risk estimation using the GARCH method is satisfactory on data which seem stationary, but not reliable on data which are non-stationary, such as data with change points. The two-state GARCH model does a little better, but gives very satisfactory results when the risk is estimated conditionally on time. The HBC method does not give satisfactory results.

Extreme events are responsible for huge material damage and are costly in terms of their human and economic impacts. They strike all facets of modern society, such as physical infrastructure and insurance companies through environmental hazards, banking and finance through stock market crises, and the internet and communication systems through network and server overloads. It is thus of increasing importance to accurately assess the risk of extreme events in order to mitigate them. Extreme value theory is a statistical approach to extrapolation of probabilities beyond the range of the data, which provides a robust framework to learn from an often small number of recorded extreme events.
In this thesis, we consider a conditional approach to modelling extreme values that is more flexible than standard models for simultaneously extreme events. We explore the subasymptotic properties of this conditional approach and prove that in specific situations its finite-sample behaviour can differ significantly from its limit characterisation.
For modelling extremes in time series with short-range dependence, the standard peaks-over-threshold method relies on a pre-processing step that retains only a subset of observations exceeding a high threshold and can result in badly-biased estimates. This method focuses on the marginal distribution of the extremes and does not estimate temporal extremal dependence.
We propose a new methodology to model time series extremes using Bayesian semiparametrics and allowing estimation of functionals of clusters of extremes.
We apply our methodology to model river flow data in England and improve flood risk assessment by explicitly describing extremal dependence in time, using information from all exceedances of a high threshold.
We develop two new bivariate models which are based on the conditional tail approach, and use all observations having at least one extreme component in our inference procedure, thus extracting more information from the data than existing approaches. We compare the efficiency of these models in a simulation study and discuss generalisations to higher-dimensional setups.
Existing models for extremes of Markov chains generally rely on a strong assumption of asymptotic dependence at all lags and separately consider marginal and joint features. We introduce a more flexible model and show how Bayesian semiparametrics can provide a suitable framework allowing simultaneous inference for the margins and the extremal dependence structure, yielding efficient risk estimates and a reliable assessment of uncertainty.