**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Robust Multivariate and Nonlinear Time Series Models

Thèse EPFL

Résumé

Time series modeling and analysis is central to most financial and econometric data modeling. With increased globalization in trade, commerce and finance, national variables like gross domestic productivity (GDP) and unemployment rate, market variables like indices and stock prices and global variables like commodity prices are more tightly coupled than ever before. This translates to the use of multivariate or vector time series models and algorithms in analyzing and understanding the relationships that these variables share with each other. Autocorrelation is one of the fundamental aspects of time series modeling. However, traditional linear models, that arise from a strong observed autocorrelation in many financial and econometric time series data, are at times unable to capture the rather nonlinear relationship that characterizes many time series data. This necessitates the study of nonlinear models in analyzing such time series. The class of bilinear models is one of the simplest nonlinear models. These models are able to capture temporary erratic fluctuations that are common in many financial returns series and thus, are of tremendous interest in financial time series analysis. Another aspect of time series analysis is homoscedasticity versus heteroscedasticity. Many time series data, even after differencing, exhibit heteroscedasticity. Thus, it becomes important to incorporate this feature in the associated models. The class of conditional heteroscedastic autoregressive (ARCH) models and its variants form the primary backbone of conditional heteroscedastic time series models. Robustness is a highly underrated feature of most time series applications and models that are presently in use in the industry. With an ever increasing amount of information available for modeling, it is not uncommon for the data to have some aberrations within itself in terms of level shifts and the occasional large fluctuations. Conventional methods like the maximum likelihood and least squares are well known to be highly sensitive to such contaminations. Hence, it becomes important to use robust methods, especially in this age with high amounts of computing power readily available, to take into account such aberrations. While robustness and time series modeling have been vastly researched individually in the past, application of robust methods to estimate time series models is still quite open. The central goal of this thesis is the study of robust parameter estimation of some simple vector and nonlinear time series models. More precisely, we will briefly study some prominent linear and nonlinear models in the time series literature and apply the robust S-estimator in estimating parameters of some simple models like the vector autoregressive (VAR) model, the (0, 0, 1, 1) bilinear model and a simple conditional heteroscedastic bilinear model. In each case, we will look at the important aspect of stationarity of the model and analyze the asymptotic behavior of the S-estimator.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (41)

Estimateur (statistique)

En statistique, un estimateur est une fonction permettant d'estimer un moment d'une loi de probabilité (comme son espérance ou sa variance). Il peut par exemple servir à estimer certaines caractérist

Série temporelle

thumb|Exemple de visualisation de données montrant une tendances à moyen et long terme au réchauffement, à partir des séries temporelles de températures par pays (ici regroupés par continents, du nord

Théorie de l'estimation

En statistique, la théorie de l'estimation s'intéresse à l'estimation de paramètres à partir de données empiriques mesurées ayant une composante aléatoire. Les paramètres décrivent un phénomène physiq

Publications associées (53)

Chargement

Chargement

Chargement

Powerful mathematical tools have been developed for trading in stocks and bonds, but other markets that are equally important for the globalized world have to some extent been neglected. We decided to study the shipping market as an new area of development in mathematical finance. The market in shipping derivatives (FFA and FOSVA) has only been developed after 2000 and now exhibits impressive growth. Financial actors have entered the field, but it is still largely undiscovered by institutional investors. The first part of the work was to identify the characteristics of the market in shipping, i.e. the segmentation and the volatility. Because the shipping business is old-fashioned, even the leading actors on the world stage (ship owners and banks) are using macro-economic models to forecast the rates. If the macro-economic models are logical and make sense, they fail to predict. For example, the factor port congestion has been much cited during the last few years, but it is clearly very difficult to control and is simply an indicator of traffic. From our own experience it appears that most ship owners are in fact market driven and rather bad at anticipating trends. Due to their ability to capture large moves, we chose to consider Lévy processes for the underlying price process. Compared with the macro-economic approach, the main advantage is the uniform and systematic structure this imposed on the models. We get in each case a favorable result for our technology and a gain in forecasting accuracy of around 10% depending on the maturity. The global distribution is more effectively modelled and the tails of the distribution are particularly well represented. This model can be used to forecast the market but also to evaluate the risk, for example, by computing the VaR. An important limitation is the non-robustness in the estimation of the Lévy processes. The use of robust estimators reinforces the information obtained from the observed data. Because maximum likelihood estimation is not easy to compute with complex processes, we only consider some very general robust score functions to manage the technical problems. Two new class of robust estimators are suggested. These are based on the work of F. Hampel ([29]) and P. Huber ([30]) using influence functions. The main idea is to bound the maximum likelihood score function. By doing this a bias is created in the parameters estimation, which can be corrected by using a modification of the following type and as proposed by F. Hampel. The procedure for finding a robust estimating equation is thus decomposed into two consecutive steps : Subtract the bias correction and then Bound the score function. In the case of complex Lévy processes, the bias correction is difficult to compute and generally unknown. We have developed a pragmatic solution by inverting the Hampel's procedure. Bound the score function and then Correct for the bias. The price is a loss of the theoretical properties of our estimators, besides the procedure converges to maximum likelihood estimate. A second solution to for achieving robust estimation is presented. It considers the limiting case when the upper and lower bounds tend to zero and leads to B-robust estimators. Because of the complexity of the Lévy distributions, this leads to identification problems.

The thesis is a contribution to extreme-value statistics, more precisely to the estimation of clustering characteristics of extreme values. One summary measure of the tendency to form groups is the inverse average cluster size. In extreme-value context, this parameter is called the extremal index, and apart from its relation with the size of groups, it appears as an important parameter measuring the effects of serial dependence on extreme levels in time series. Although several methods exist for its estimation in univariate sequences, these methods are only applicable for strictly stationary series satisfying a long-range asymptotic independence condition on extreme levels, cannot take covariates into consideration, and yield only crude estimates for the corresponding multivariate quantity. These are strong restrictions and great drawbacks. In climatic time series, both stationarity and asymptotic independence can be broken, due to climate change and possible long memory of the data, and not including information from simultaneously measured linked variables may lead to inefficient estimation. The thesis addresses these issues. First, we extend the theorem of Ferro and Segers (2003) concerning the distribution of inter-exceedance times: we introduce truncated inter-exceedance times, called K-gaps, and show that they follow the same exponential-point mass mixture distribution as the inter-exceedance times. The maximization of the likelihood built on this distribution yields a simple closed-form estimator for the extremal index. The method can admit covariates and can be applied with smoothing techniques, which allows its use in a nonstationary setting. Simulated and real data examples demonstrate the smooth estimation of the extremal index. The likelihood, based on an assumption of independence of the K-gaps, is misspecified whenever K is too small. This motivates another contribution of the thesis, the introduction into extreme-value statistics of misspecification tests based on the information matrix. For our likelihood, they are able to detect misspecification from any source, not only those due to a bad choice of the truncation parameter. They provide help also in threshold selection, and show whether the fundamental assumptions of stationarity or asymptotic independence are broken. Moreover, these diagnostic tests are of general use, and could be adapted to many kinds of extreme-value models, which are always approximate. Simulated examples demonstrate the performance of the misspecification tests in the context of extremal index estimation. Two data examples with complex behaviour, one univariate and the other bivariate, offer insight into their power in discovering situations where the fundamental assumptions of the likelihood model are not valid. In the multivariate case, the parameter corresponding to the univariate extremal index is the multivariate extremal index function. As in the univariate case, its appearance is linked to serial dependence in the observed processes. Univariate estimation methods can be applied, but are likely to give crude, unreasonably varying, estimates, and the constraints on the extremal index function implied by the characteristics of the stable tail dependence function are not automatically satisfied. The third contribution of the thesis is the development of methodology based on the M4 approximation of Smith and Weissman (1996), which can be used to estimate the multivariate extremal index, as well as other cluster characteristics. For this purpose, we give a preliminary cluster selection procedure, and approximate the noise on finite levels with a flexible semiparametric model, the Dirichlet mixtures used widely in Bayesian analysis. The model is fitted by the EM algorithm. Advantages and drawbacks of the method are discussed using the same univariate and bivariate examples as the likelihood methods.

This work deals with factorial models for multiple time series. Its core content puts it at the interface between statistics and finance. After a brief description of the historical link between the two sciences, it reviews the literature on factorial models that are close to the model introduced in this work called The dynamical factor analysis model for time series. This model makes the hypothesis that the observed time series are influenced by a common factor, difficult to define and impossible to measure. No a priori structure is put on the factor, at each point time the value of the factor is considered as a new parameter that has to be estimated. As a consequence of this fact, the number of parameters is large and it is not possible to provide the usual asimptotic properties of the estimations by letting the number of periods tend to infi- nity. Asymptotic results in our context have been obtained by increasing the number of time series. The model makes the hypothesis that there is a linear dependence between the time series and the factor with coefficients, that are not constant over time, but rather follow a smooth random walk. This mean that the linear structure of the model is evolving slowly from a period to another. All the information which is not contained in the factor and the coefficients is considered as white noise. Using the normal distribution makes the estimation more easy and opens a toolbox of statistical methods that have been developed for this kind of data. The model is a part of the family of state space models and the Kalman filter is an essential ingredient of our estimator. The effort was concentrated on the elaboration of the structure of the model. Its complexity was constrained by the difficulty of estimation. The final shape of the model does not allow an analytical solution of the optimization problem introduced by the maximum likelihood estimation of the parameters. Numerical solutions have been found and compared with the parameters for simulated data. Some others models have been developed as simpler versions of the dynamical factor model for time series. The case where the factor can be observed has been studied and a new method for the estimation has been provided and compared with the existing methods. A second study considers a latent factor model without noise. Two methods for the estimation of the factor have been provided. The last chapter contains a detailed description of the main statistical tools used during this work. The links with the previous chapters are highlighted followed by comments.