**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Robust statistics

Summary

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.
Introduction
Robust statistics seek to provide methods that emulate popular statistical methods, but are not unduly affected by outliers or other small departures from model assumptions. In statistics, classical estimation methods rely heavily on assumptions that are often not met in

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related concepts (43)

Median

In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be t

Outlier

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Related courses (26)

The course aims at developing certain key aspects of the theory of statistics, providing a common general framework for statistical methodology. While the main emphasis will be on the mathematical aspects of statistics, an effort will be made to balance rigor and intuition.

The course provides an introduction to econometrics. The objective is to learn how to make valid (i.e., causal) inference from economic data. It explains the main estimators and present methods to deal with endogeneity issues.

This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the data science world (pandas, scikit-learn, Spark, etc.)

Related people (12)

Related publications (94)

Loading

Loading

Loading

Related units (9)

Generalized Linear Models have become a commonly used tool of data analysis. Such models are used to fit regressions for univariate responses with normal, gamma, binomial or Poisson distribution. Maximum likelihood is generally applied as fitting method. In the usual regression setting the least absolute-deviations estimator (L1-norm) is a popular alternative to least squares (L2-norm) because of its simplicity and its robustness properties. In the first part of this thesis we examine the question of how much of these robustness features carry over to the setting of generalized linear models. We study a robust procedure based on the minimum absolute deviation estimator of Morgenthaler (1992), the Lq quasi-likelihood when q = 1. In particular, we investigate the influence function of these estimates and we compare their sensitivity to that of the maximum likelihood estimate. Furthermore we particularly explore the Lq quasi-likelihood estimates in binary regression. These estimates are difficult to compute. We derive a simpler estimator, which has a similar form as the Lq quasi-likelihood estimate. The resulting estimating equation consists in a simple modification of the familiar maximum likelihood equation with the weights wq(μ). This presents an improvement compared to other robust estimates discussed in the literature that typically have weights, which depend on the couple (xi, yi) rather than on μi = h(xiT β) alone. Finally, we generalize this estimator to Poisson regression. The resulting estimating equation is a weighted maximum likelihood with weights that depend on μ only.

In this paper, we derive elementary M- and optimally robust asymptotic linear (AL)-estimates for the parameters of an Ornstein-Uhlenbeck process. Simulation and estimation of the process are already well-studied, see Iacus (Simulation and inference for stochastic differential equations. Springer, New York, 2008). However, in order to protect against outliers and deviations from the ideal law the formulation of suitable neighborhood models and a corresponding robustification of the estimators are necessary. As a measure of robustness, we consider the maximum asymptotic mean square error (maxasyMSE), which is determined by the influence curve (IC) of AL estimates. The IC represents the standardized influence of an individual observation on the estimator given the past. In a first step, we extend the method of M-estimation from Huber (Robust statistics. Wiley, New York, 1981). In a second step, we apply the general theory based on local asymptotic normality, AL estimates, and shrinking neighborhoods due to Kohl et al. (Stat Methods Appl 19:333-354, 2010), Rieder (Robust asymptotic statistics. Springer, New York, 1994), Rieder (2003), and Staab (1984). This leads to optimally robust ICs whose graph exhibits surprising behavior. In the end, we discuss the estimator construction, i.e. the problem of constructing an estimator from the family of optimal ICs. Therefore we carry out in our context the One-Step construction dating back to LeCam (Asymptotic methods in statistical decision theory. Springer, New York, 1969) and compare it by means of simulations with MLE and M-estimator.

Powerful mathematical tools have been developed for trading in stocks and bonds, but other markets that are equally important for the globalized world have to some extent been neglected. We decided to study the shipping market as an new area of development in mathematical finance. The market in shipping derivatives (FFA and FOSVA) has only been developed after 2000 and now exhibits impressive growth. Financial actors have entered the field, but it is still largely undiscovered by institutional investors. The first part of the work was to identify the characteristics of the market in shipping, i.e. the segmentation and the volatility. Because the shipping business is old-fashioned, even the leading actors on the world stage (ship owners and banks) are using macro-economic models to forecast the rates. If the macro-economic models are logical and make sense, they fail to predict. For example, the factor port congestion has been much cited during the last few years, but it is clearly very difficult to control and is simply an indicator of traffic. From our own experience it appears that most ship owners are in fact market driven and rather bad at anticipating trends. Due to their ability to capture large moves, we chose to consider Lévy processes for the underlying price process. Compared with the macro-economic approach, the main advantage is the uniform and systematic structure this imposed on the models. We get in each case a favorable result for our technology and a gain in forecasting accuracy of around 10% depending on the maturity. The global distribution is more effectively modelled and the tails of the distribution are particularly well represented. This model can be used to forecast the market but also to evaluate the risk, for example, by computing the VaR. An important limitation is the non-robustness in the estimation of the Lévy processes. The use of robust estimators reinforces the information obtained from the observed data. Because maximum likelihood estimation is not easy to compute with complex processes, we only consider some very general robust score functions to manage the technical problems. Two new class of robust estimators are suggested. These are based on the work of F. Hampel ([29]) and P. Huber ([30]) using influence functions. The main idea is to bound the maximum likelihood score function. By doing this a bias is created in the parameters estimation, which can be corrected by using a modification of the following type and as proposed by F. Hampel. The procedure for finding a robust estimating equation is thus decomposed into two consecutive steps : Subtract the bias correction and then Bound the score function. In the case of complex Lévy processes, the bias correction is difficult to compute and generally unknown. We have developed a pragmatic solution by inverting the Hampel's procedure. Bound the score function and then Correct for the bias. The price is a loss of the theoretical properties of our estimators, besides the procedure converges to maximum likelihood estimate. A second solution to for achieving robust estimation is presented. It considers the limiting case when the upper and lower bounds tend to zero and leads to B-robust estimators. Because of the complexity of the Lévy distributions, this leads to identification problems.

Related lectures (49)