Donnée aberrante

vignette|Ce graphique permet de visualiser la répartition de doyens selon leur âge de décès et l'âge de décès moyen des doyens de leur époque. Le record de longévité de Jeanne Calment constitue une anomalie statistique qui continue d'intriguer les gérontologues. En statistique, une donnée aberrante (anglais outlier) est une valeur ou une observation qui est « distante » des autres observations effectuées sur le même phénomène, c'est-à-dire qu'elle contraste grandement avec les valeurs « normalement » mesurées. Une donnée aberrante peut être due à la variabilité inhérente au phénomène observé, ou indiquer une erreur expérimentale. Dans ce dernier cas, elles sont parfois écartées. Les données aberrantes peuvent apparaître par hasard dans n'importe quelle distribution, mais elles indiquent souvent soit une erreur de mesure, soit que la population est distribuée suivant une loi de probabilité à queue lourde. Dans le premier cas, il convient de se débarrasser de ces valeurs ou bien d'ut
Digital imaging brings a new set of possibilities to photography. For example, little pictures can be assembled to form a large panorama, and digital cameras are trying to mimic the human visual system to produce better pictures. This manuscript aims at developing the algorithms required to stitch a set of pictures together to obtain a bigger and better image. This thesis explores three important topics of panoramic photography: The alignment of images, the matching of the colours, and the rendering of the resulting panorama. In addition, one chapter is devoted to 3D and constrained estimation. Aligning pictures can be difficult when the scene changes while taking the photographs. A method is proposed to model these changes —or outliers— that appear in image pairs, by computing the outlier distribution from the image histograms and handling the image-to-image correspondence problem as a mixture of inliers versus outliers. Compared to the standard methods, this approach uses the information contained in the image in a better way, and leads to a more reliable result. Digital cameras aim at reproducing the adaptation capabilities of the human eye in capturing the colours of a scene. As a consequence, there is often a large colour mismatch between two pictures. This work exposes a novel way of correcting for colour mismatches by modelling the transformation introduced by the camera, and reversing it to get consistent colours. Finally, this manuscript proposes a method to render high dynamic range images that contain very bright as well as very dark regions. To reproduce this kind of pictures the contrast has to be reduced in order to match the maximum contrast displayable on a screen or on paper. This last method, which is based on a complex model of the human visual system, reduces the contrast of the image while maintaining the little details visible the scene.

In geostatistics, the presence of outlying data is more the rule than the exception. Moreover, the statistical analysis of data contaminated by outliers requires caution, particularly when a spatial dependence exists. In order to take into account these possible outliers during the adjustment of the spatial process, a new modeling tool, called the substitutive errors model, is proposed. The optimal prediction in the least squares sense is derived and its properties are studied. Because of its complexity, this estimator needs in practice to be numerically approximated. An automated algorithm is proposed in this thesis. This method is based on an ordering of the observations with respect to the specified spatial process of interest, with the values least in agreement being included towards the end of the ordering. It proves to be useful in case of masked multiple outliers or nonstationary clusters. Simulations are carried out to illustrate its performances and to compare it to other forecasts, robust or not. An application to real data is provided as an illustration of its practical usefulness. The second part of this work also deals with the presence of spatial heterogeneity. One could say that the proposed model offers a characterization of this heterogeneity rather than estimating the locations and sizes of outliers. It is based on the theory of bidimensional α-stable motion. This represents a generalization of the unidimensional Brownian motion. In particular, the stability parameter α can be seen as a measure of the distance between the observations and the hypothesis of a Gaussian distribution. A method of estimation for the parameters of such a process is presented, based on a numerical constrained optimization of the likelihood. Its performances are illustrated by means of simulations. An application ends this second part.

We introduce an online outlier detection algorithm to detect outliers in a sequentially observed data stream. For this purpose, we use a two-stage filtering and hedging approach. In the first stage, we construct a multimodal probability density function to model the normal samples. In the second stage, given a new observation, we label it as an anomaly if the value of aforementioned density function is below a specified threshold at the newly observed point. In order to construct our multimodal density function, we use an incremental decision tree to construct a set of subspaces of the observation space. We train a single component density function of the exponential family using the observations, which fall inside each subspace represented on the tree. These single component density functions are then adaptively combined to produce our multimodal density function, which is shown to achieve the performance of the best convex combination of the density functions defined on the subspaces. As we observe more samples, our tree grows and produces more subspaces. As a result, our modeling power increases in time, while mitigating overfitting issues. In order to choose our threshold level to label the observations, we use an adaptive thresholding scheme. We show that our adaptive threshold level achieves the performance of the optimal prefixed threshold level, which knows the observation labels in hindsight. Our algorithm provides significant performance improvements over the state of the art in our wide set of experiments involving both synthetic as well as real data.
Robustesse (statistiques)
En statistiques, la robustesse d'un estimateur est sa capacité à ne pas être perturbé par une modification dans une petite partie des données ou dans les paramètres du modèle choisi pour l'estimati
Médiane (statistiques)
En théorie des probabilités et en statistiques, la médiane est une valeur qui sépare la moitié inférieure et la moitié supérieure des termes d’une série statistique quantitative ou d’une variable alé
La statistique est la discipline qui étudie des phénomènes à travers la collecte de données, leur traitement, leur analyse, l'interprétation des résultats et leur présentation afin de rendre ces don
MATH-341: Linear models
Regression modelling is a fundamental tool of statistics, because it describes how the law of a random variable of interest may depend on other variables. This course aims to familiarize students with linear models and some of their extensions, which lie at the basis of more general regression model
MATH-408: Regression methods
General graduate course on regression methods
ME-213: Programmation pour ingénieur
Mettre en pratique les bases de la programmation vues au semestre précédent. Développer un logiciel structuré. Méthode de debug d'un logiciel. Introduction à la programmation scientifique. Introduction à l'instrumentation virtuelle.
