**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Un modèle factoriel dynamique pour séries temporelles

Résumé

This work deals with factorial models for multiple time series. Its core content puts it at the interface between statistics and finance. After a brief description of the historical link between the two sciences, it reviews the literature on factorial models that are close to the model introduced in this work called The dynamical factor analysis model for time series. This model makes the hypothesis that the observed time series are influenced by a common factor, difficult to define and impossible to measure. No a priori structure is put on the factor, at each point time the value of the factor is considered as a new parameter that has to be estimated. As a consequence of this fact, the number of parameters is large and it is not possible to provide the usual asimptotic properties of the estimations by letting the number of periods tend to infi- nity. Asymptotic results in our context have been obtained by increasing the number of time series. The model makes the hypothesis that there is a linear dependence between the time series and the factor with coefficients, that are not constant over time, but rather follow a smooth random walk. This mean that the linear structure of the model is evolving slowly from a period to another. All the information which is not contained in the factor and the coefficients is considered as white noise. Using the normal distribution makes the estimation more easy and opens a toolbox of statistical methods that have been developed for this kind of data. The model is a part of the family of state space models and the Kalman filter is an essential ingredient of our estimator. The effort was concentrated on the elaboration of the structure of the model. Its complexity was constrained by the difficulty of estimation. The final shape of the model does not allow an analytical solution of the optimization problem introduced by the maximum likelihood estimation of the parameters. Numerical solutions have been found and compared with the parameters for simulated data. Some others models have been developed as simpler versions of the dynamical factor model for time series. The case where the factor can be observed has been studied and a new method for the estimation has been provided and compared with the existing methods. A second study considers a latent factor model without noise. Two methods for the estimation of the factor have been provided. The last chapter contains a detailed description of the main statistical tools used during this work. The links with the previous chapters are highlighted followed by comments.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (31)

Publications associées (80)

Série temporelle

thumb|Exemple de visualisation de données montrant une tendances à moyen et long terme au réchauffement, à partir des séries temporelles de températures par pays (ici regroupés par continents, du nord

Statistique

La statistique est la discipline qui étudie des phénomènes à travers la collecte de données, leur traitement, leur analyse, l'interprétation des résultats et leur présentation afin de rendre ces don

Évolution (biologie)

En biologie, l’évolution est la transformation du monde vivant au cours du temps, qui se manifeste par des changements phénotypiques des organismes à travers les générations. Ces changements général

Chargement

Chargement

Chargement

Time series modeling and analysis is central to most financial and econometric data modeling. With increased globalization in trade, commerce and finance, national variables like gross domestic productivity (GDP) and unemployment rate, market variables like indices and stock prices and global variables like commodity prices are more tightly coupled than ever before. This translates to the use of multivariate or vector time series models and algorithms in analyzing and understanding the relationships that these variables share with each other. Autocorrelation is one of the fundamental aspects of time series modeling. However, traditional linear models, that arise from a strong observed autocorrelation in many financial and econometric time series data, are at times unable to capture the rather nonlinear relationship that characterizes many time series data. This necessitates the study of nonlinear models in analyzing such time series. The class of bilinear models is one of the simplest nonlinear models. These models are able to capture temporary erratic fluctuations that are common in many financial returns series and thus, are of tremendous interest in financial time series analysis. Another aspect of time series analysis is homoscedasticity versus heteroscedasticity. Many time series data, even after differencing, exhibit heteroscedasticity. Thus, it becomes important to incorporate this feature in the associated models. The class of conditional heteroscedastic autoregressive (ARCH) models and its variants form the primary backbone of conditional heteroscedastic time series models. Robustness is a highly underrated feature of most time series applications and models that are presently in use in the industry. With an ever increasing amount of information available for modeling, it is not uncommon for the data to have some aberrations within itself in terms of level shifts and the occasional large fluctuations. Conventional methods like the maximum likelihood and least squares are well known to be highly sensitive to such contaminations. Hence, it becomes important to use robust methods, especially in this age with high amounts of computing power readily available, to take into account such aberrations. While robustness and time series modeling have been vastly researched individually in the past, application of robust methods to estimate time series models is still quite open. The central goal of this thesis is the study of robust parameter estimation of some simple vector and nonlinear time series models. More precisely, we will briefly study some prominent linear and nonlinear models in the time series literature and apply the robust S-estimator in estimating parameters of some simple models like the vector autoregressive (VAR) model, the (0, 0, 1, 1) bilinear model and a simple conditional heteroscedastic bilinear model. In each case, we will look at the important aspect of stationarity of the model and analyze the asymptotic behavior of the S-estimator.

Functional time series is a temporally ordered sequence of not necessarily independent random curves. While the statistical analysis of such data has been traditionally carried out under the assumption of completely observed functional data, it may well happen that the statistician only has access to a relatively low number of sparse measurements for each random curve. These discrete measurements may be moreover irregularly scattered in each curve's domain, missing altogether for some curves, and be contaminated by measurement noise. This sparse sampling protocol escapes from the reach of established estimators in functional time series analysis and therefore requires development of a novel methodology.
The core objective of this thesis is development of a non-parametric statistical toolbox for analysis of sparsely observed functional time series data. Assuming smoothness of the latent curves, we construct a local-polynomial-smoother based estimator of the spectral density operator producing a consistent estimator of the complete second order structure of the data. Moreover, the spectral domain recovery approach allows for prediction of latent curve data at a given time by borrowing strength from the estimated dynamic correlations in the entire time series across time. Further to predicting the latent curves from their noisy point samples, the method fills in gaps in the sequence (curves nowhere sampled), denoises the data, and serves as a basis for forecasting.
A classical non-parametric apparatus for encoding the dependence between a pair of or among a multiple functional time series, whether sparsely or fully observed, is the functional lagged regression model. This consists of a linear filter between the regressors time series and the response. We show how to tailor the smoother based estimators for the estimation of the cross-spectral density operators and the cross-covariance operators and, by means of spectral truncation and Tikhonov regularisation techniques, how to estimate the lagged regression filter and predict the response process.
The simulation studies revealed the following findings: (i) if one has freedom to design a sampling scheme with a fixed number of measurements, it is advantageous to sparsely distribute these measurements in a longer time horizon rather than concentrating over a shorter time horizon to achieve dense measurements in order to diminish the spectral density estimation error, (ii) the developed functional recovery predictor surpasses the static predictor not exploiting the temporal dependence, (iii) neither of the two considered regularisation techniques can, in general, dominate the other for the estimation in functional lagged regression models. The new methodologies are illustrated by applications to real data: the meteorological data revolving around the fair-weather atmospheric electricity measured in Tashkent, Uzbekistan, and at Wank mountain, Germany; and a case study analysing the dependence of the US Treasury yield curve on macroeconomic variables.
As a secondary contribution, we present a novel simulation method for general stationary functional time series defined through their spectral properties. A simulation study shows universality of such approach and superiority of the spectral domain simulation over the temporal domain in some situations.

The thesis is a contribution to extreme-value statistics, more precisely to the estimation of clustering characteristics of extreme values. One summary measure of the tendency to form groups is the inverse average cluster size. In extreme-value context, this parameter is called the extremal index, and apart from its relation with the size of groups, it appears as an important parameter measuring the effects of serial dependence on extreme levels in time series. Although several methods exist for its estimation in univariate sequences, these methods are only applicable for strictly stationary series satisfying a long-range asymptotic independence condition on extreme levels, cannot take covariates into consideration, and yield only crude estimates for the corresponding multivariate quantity. These are strong restrictions and great drawbacks. In climatic time series, both stationarity and asymptotic independence can be broken, due to climate change and possible long memory of the data, and not including information from simultaneously measured linked variables may lead to inefficient estimation. The thesis addresses these issues. First, we extend the theorem of Ferro and Segers (2003) concerning the distribution of inter-exceedance times: we introduce truncated inter-exceedance times, called K-gaps, and show that they follow the same exponential-point mass mixture distribution as the inter-exceedance times. The maximization of the likelihood built on this distribution yields a simple closed-form estimator for the extremal index. The method can admit covariates and can be applied with smoothing techniques, which allows its use in a nonstationary setting. Simulated and real data examples demonstrate the smooth estimation of the extremal index. The likelihood, based on an assumption of independence of the K-gaps, is misspecified whenever K is too small. This motivates another contribution of the thesis, the introduction into extreme-value statistics of misspecification tests based on the information matrix. For our likelihood, they are able to detect misspecification from any source, not only those due to a bad choice of the truncation parameter. They provide help also in threshold selection, and show whether the fundamental assumptions of stationarity or asymptotic independence are broken. Moreover, these diagnostic tests are of general use, and could be adapted to many kinds of extreme-value models, which are always approximate. Simulated examples demonstrate the performance of the misspecification tests in the context of extremal index estimation. Two data examples with complex behaviour, one univariate and the other bivariate, offer insight into their power in discovering situations where the fundamental assumptions of the likelihood model are not valid. In the multivariate case, the parameter corresponding to the univariate extremal index is the multivariate extremal index function. As in the univariate case, its appearance is linked to serial dependence in the observed processes. Univariate estimation methods can be applied, but are likely to give crude, unreasonably varying, estimates, and the constraints on the extremal index function implied by the characteristics of the stable tail dependence function are not automatically satisfied. The third contribution of the thesis is the development of methodology based on the M4 approximation of Smith and Weissman (1996), which can be used to estimate the multivariate extremal index, as well as other cluster characteristics. For this purpose, we give a preliminary cluster selection procedure, and approximate the noise on finite levels with a flexible semiparametric model, the Dirichlet mixtures used widely in Bayesian analysis. The model is fitted by the EM algorithm. Advantages and drawbacks of the method are discussed using the same univariate and bivariate examples as the likelihood methods.