**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Model-fitting in the presence of outliers

Abstract

We study the problem of parametric model-fitting in a finite alphabet setting. We characterize the weak convergence of the goodness-of-fit statistic with respect to an exponential family when the observations are drawn from some alternate distribution. We then study the effects of outliers on the model-fitting procedure by specializing our results to $\epsilon$-contaminated versions of distributions from the exponential family. We characterize the sensitivity of various distributions from the exponential family to outliers, and provide guidelines for choosing thresholds for a goodness-of-fit test that is robust to outliers in the data.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (23)

Related concepts (25)

Test statistic

A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing. A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.

Goodness of fit

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test).

Pearson's chi-squared test

Pearson's chi-squared test () is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., Yates, likelihood ratio, portmanteau test in time series, etc.) – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900.

Anastasios Vassilopoulos, Guangjian Xiang

A probabilistic model for estimating the fatigue life of composite laminates based on the mean value and standard deviation of the fatigue life is introduced here for predicting the distribution of fatigue life at any stress level for a constant stress rat ...

Victor Panaretos, Laya Ghodrati

We consider the problem of defining and fitting models of autoregressive time series of probability distributions on a compact interval of Double-struck capital R. An order-1 autoregressive model in this context is to be understood as a Markov chain, where ...

How can we discern whether the covariance operator of a stochastic pro-cess is of reduced rank, and if so, what its precise rank is? And how can we do so at a given level of confidence? This question is central to a great deal of methods for functional dat ...