**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Robust regression

Summary

In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise (i.e. are not robust to assumption violations). Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates.
For example, least squares estimates for regression models are highly sensitive to outliers: an outlier with twice the error magnitude of a typical observation contributes four (two squared) times as much to the squared error loss, and therefore has more leverage over the regression estimates. The Huber loss function is a robust alternative to standard square error loss that reduces outl

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (11)

Related concepts (10)

Robust statistics

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variable

Least absolute deviations

Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical opti

Related publications (72)

Loading

Loading

Loading

Related units (11)

Time series modeling and analysis is central to most financial and econometric data modeling. With increased globalization in trade, commerce and finance, national variables like gross domestic productivity (GDP) and unemployment rate, market variables like indices and stock prices and global variables like commodity prices are more tightly coupled than ever before. This translates to the use of multivariate or vector time series models and algorithms in analyzing and understanding the relationships that these variables share with each other. Autocorrelation is one of the fundamental aspects of time series modeling. However, traditional linear models, that arise from a strong observed autocorrelation in many financial and econometric time series data, are at times unable to capture the rather nonlinear relationship that characterizes many time series data. This necessitates the study of nonlinear models in analyzing such time series. The class of bilinear models is one of the simplest nonlinear models. These models are able to capture temporary erratic fluctuations that are common in many financial returns series and thus, are of tremendous interest in financial time series analysis. Another aspect of time series analysis is homoscedasticity versus heteroscedasticity. Many time series data, even after differencing, exhibit heteroscedasticity. Thus, it becomes important to incorporate this feature in the associated models. The class of conditional heteroscedastic autoregressive (ARCH) models and its variants form the primary backbone of conditional heteroscedastic time series models. Robustness is a highly underrated feature of most time series applications and models that are presently in use in the industry. With an ever increasing amount of information available for modeling, it is not uncommon for the data to have some aberrations within itself in terms of level shifts and the occasional large fluctuations. Conventional methods like the maximum likelihood and least squares are well known to be highly sensitive to such contaminations. Hence, it becomes important to use robust methods, especially in this age with high amounts of computing power readily available, to take into account such aberrations. While robustness and time series modeling have been vastly researched individually in the past, application of robust methods to estimate time series models is still quite open. The central goal of this thesis is the study of robust parameter estimation of some simple vector and nonlinear time series models. More precisely, we will briefly study some prominent linear and nonlinear models in the time series literature and apply the robust S-estimator in estimating parameters of some simple models like the vector autoregressive (VAR) model, the (0, 0, 1, 1) bilinear model and a simple conditional heteroscedastic bilinear model. In each case, we will look at the important aspect of stationarity of the model and analyze the asymptotic behavior of the S-estimator.

Related courses (22)

General graduate course on regression methods

Regression modelling is a fundamental tool of statistics, because it describes how the law of a random variable of interest may depend on other variables. This course aims to familiarize students with linear models and some of their extensions, which lie at the basis of more general regression model

The course provides an introduction to econometrics. The objective is to learn how to make valid (i.e., causal) inference from economic data. It explains the main estimators and present methods to deal with endogeneity issues.

Related lectures (72)

In this thesis, we treat robust estimation for the parameters of the Ornstein–Uhlenbeck process, which are the mean, the variance, and the friction. We start by considering classical maximum likelihood estimation. For the simulation study, where we also investigate the choice of the time lag, we use the method of moment (MoM) estimator as initial estimator for the friction parameter of the maximum likelihood estimator (MLE). However, in several aspects the MLE is not robust. For robustification, we first derive elementary M-estimates by extending the method of M-estimation from Huber (1981). We use an intuitively robustified MoM estimate as initial estimate and compare by means of simulation the M-estimate with the MLE. This approach is, however, only ad-hoc since Huber’s minimum Fisher information and minimax asymptotic variance theory remains incomplete for simultaneous location and scale, and does not cover more general models (as for example the Ornstein–Uhlenbeck process). A more general robustness concept due to Kohl et al. (2010), Rieder (1994), and Staab (1984) is based on local asymptotic normality (LAN), asymptotically linear (AL) estimates, and shrinking neighborhoods. We then apply this concept to the Ornstein–Uhlenbeck process. As a measure of robustness, we consider the maximum asymptotic mean square error (maxasyMSE), which is determined by the influence curve (IC) of AL estimates. The IC represents the standardized influence of an individual observation on the estimator given the past. For two kind of neighborhoods (average and average square neighborhoods) we obtain optimally robust ICs. In case of average neighborhoods, their graph exhibits surprising, redescending behavior. For average square neighborhoods the graph is between the one of the elementary M-estimates and the MLE. Finally, we discuss the estimator construction, that is, the problem of constructing an estimator from the family of optimal ICs. We carry out in our context the One-Step construction dating back to LeCam and use both an intuitively robustified MoM estimate and the elementary M-estimate as initial estimate. This results in optimally AL estimates (for average and average square neighborhoods). By means of simulation we then compare the different estimators: MLE, elementary M-estimates, and optimally AL estimates. In addition, we give an application to electricity prices.

In this paper, we derive elementary M- and optimally robust asymptotic linear (AL)-estimates for the parameters of an Ornstein-Uhlenbeck process. Simulation and estimation of the process are already well-studied, see Iacus (Simulation and inference for stochastic differential equations. Springer, New York, 2008). However, in order to protect against outliers and deviations from the ideal law the formulation of suitable neighborhood models and a corresponding robustification of the estimators are necessary. As a measure of robustness, we consider the maximum asymptotic mean square error (maxasyMSE), which is determined by the influence curve (IC) of AL estimates. The IC represents the standardized influence of an individual observation on the estimator given the past. In a first step, we extend the method of M-estimation from Huber (Robust statistics. Wiley, New York, 1981). In a second step, we apply the general theory based on local asymptotic normality, AL estimates, and shrinking neighborhoods due to Kohl et al. (Stat Methods Appl 19:333-354, 2010), Rieder (Robust asymptotic statistics. Springer, New York, 1994), Rieder (2003), and Staab (1984). This leads to optimally robust ICs whose graph exhibits surprising behavior. In the end, we discuss the estimator construction, i.e. the problem of constructing an estimator from the family of optimal ICs. Therefore we carry out in our context the One-Step construction dating back to LeCam (Asymptotic methods in statistical decision theory. Springer, New York, 1969) and compare it by means of simulations with MLE and M-estimator.