Concept# Corrélation (statistiques)

Résumé

En probabilités et en statistique, la corrélation entre plusieurs variables aléatoires ou statistiques est une notion de liaison qui contredit leur indépendance.
Cette corrélation est très souvent réduite à la corrélation linéaire entre variables quantitatives, c’est-à-dire l’ajustement d’une variable par rapport à l’autre par une relation affine obtenue par régression linéaire. Pour cela, on calcule un coefficient de corrélation linéaire, quotient de leur covariance par le produit de leurs écarts types. Son signe indique si des valeurs plus hautes de l’une correspondent « en moyenne » à des valeurs plus hautes ou plus basses pour l’autre. La valeur absolue du coefficient, toujours comprise entre 0 et 1, ne mesure pas l’intensité de la liaison mais la prépondérance de la relation affine sur les variations internes des variables. Un coefficient nul n’implique pas l'indépendance, car d’autres types de corrélation sont possibles.
D’autres indicateurs permettent de calculer un coeffic

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Publications associées

Chargement

Personnes associées

Chargement

Unités associées

Chargement

Concepts associés

Chargement

Cours associés

Chargement

Séances de cours associées

Chargement

Publications associées (100)

Chargement

Chargement

Chargement

Personnes associées (66)

Concepts associés (76)

Statistique

La statistique est la discipline qui étudie des phénomènes à travers la collecte de données, leur traitement, leur analyse, l'interprétation des résultats et leur présentation afin de rendre ces don

Pearson correlation coefficient

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variab

Loi normale

En théorie des probabilités et en statistique, les lois normales sont parmi les lois de probabilité les plus utilisées pour modéliser des phénomènes naturels issus de plusieurs événements aléatoires.

Unités associées (51)

Time series modeling and analysis is central to most financial and econometric data modeling. With increased globalization in trade, commerce and finance, national variables like gross domestic productivity (GDP) and unemployment rate, market variables like indices and stock prices and global variables like commodity prices are more tightly coupled than ever before. This translates to the use of multivariate or vector time series models and algorithms in analyzing and understanding the relationships that these variables share with each other. Autocorrelation is one of the fundamental aspects of time series modeling. However, traditional linear models, that arise from a strong observed autocorrelation in many financial and econometric time series data, are at times unable to capture the rather nonlinear relationship that characterizes many time series data. This necessitates the study of nonlinear models in analyzing such time series. The class of bilinear models is one of the simplest nonlinear models. These models are able to capture temporary erratic fluctuations that are common in many financial returns series and thus, are of tremendous interest in financial time series analysis. Another aspect of time series analysis is homoscedasticity versus heteroscedasticity. Many time series data, even after differencing, exhibit heteroscedasticity. Thus, it becomes important to incorporate this feature in the associated models. The class of conditional heteroscedastic autoregressive (ARCH) models and its variants form the primary backbone of conditional heteroscedastic time series models. Robustness is a highly underrated feature of most time series applications and models that are presently in use in the industry. With an ever increasing amount of information available for modeling, it is not uncommon for the data to have some aberrations within itself in terms of level shifts and the occasional large fluctuations. Conventional methods like the maximum likelihood and least squares are well known to be highly sensitive to such contaminations. Hence, it becomes important to use robust methods, especially in this age with high amounts of computing power readily available, to take into account such aberrations. While robustness and time series modeling have been vastly researched individually in the past, application of robust methods to estimate time series models is still quite open. The central goal of this thesis is the study of robust parameter estimation of some simple vector and nonlinear time series models. More precisely, we will briefly study some prominent linear and nonlinear models in the time series literature and apply the robust S-estimator in estimating parameters of some simple models like the vector autoregressive (VAR) model, the (0, 0, 1, 1) bilinear model and a simple conditional heteroscedastic bilinear model. In each case, we will look at the important aspect of stationarity of the model and analyze the asymptotic behavior of the S-estimator.

Switzerland has about 6,000 shooting ranges, of which 2,000 are still in use. These activities generate a significant lead pollution. FOEN estimated 200 tons of lead is fired each year. This Master Project has been realized with the consultancy office CSD Ingénieurs Conseils SA in Lausanne. The project is divided into three parts related to investigations and remediation of shooting ranges. Eleven shooting ranges of the army and two communal shooting ranges have formed the field study of this project. On these sites, 20 samples were taken for laboratory analysis. Nearly 350 field tests with a portable X-ray fluorescence (XRF) were also performed. The first part of the study focused on sample preparation methods. Based on several assumptions, they often give conflicting results. It has been shown that not taking into account the size fraction between 1 cm and 2 mm leads to underestimating the lead concentration by 10% on average. Taking into account the bullets, and how, is even more important. The original content of lead in standard military bullets is 74% of the weight. A common use is to assume this content for the bullets found in soils samples. The lead content measured in bullets is from 64% for those in good shape to 17% for fragments. Lead seems to spread more easily than the steel jacket of the bullets. These results are of major importance for the calculation of the lead content of samples. A strong correlation between lead and bullet amount in soils has not been found. The condition of the bullets seemed to be a significant factor, but the representativeness of the samples was not sufficient due too few mass samples. Taking into account these results, a preparation method was proposed. The second part of the study concern leachate analysis. In particular, OSites leachate and OTD1 (acid) leachate was studied. These leachates are used respectively for polluted sites risk assessment and for choosing treatment solution of excavated materials. Correlations were looked for, between concentrations in leachate and soils parameters. CEC, clay content, organic carbon content and aquous pH has been used as soils parameters. Leachate OTD1 (neutral) has been preferred to leachate Osites for economic reasons, these two tests provide similar results for lead. Good correlations were obtained with non-linear models taking into account all the soils parameters. Models based only on the lead content and aqueous pH (better correlation) give interesting results, but have greater error. The third part focused on the study of geostatistics shooting ranges with fixed targets. Variogram were made from field measurements. A metric sampling grid metric was used. The variogram showed a remedy made from a range of 3 meters. Although the site shows a clear anisotropy, an isotropic spherical variogram with nugget effect has been adjusted. The data were insufficient to fit an anisotropic model. Calculations showed that the kriging interpolation provides better results than those obtained with conventional interpolation. The study also showed that the uncertainty of the kriging interpolation decreases when the mesh sampling is lower. An optimum between accuracy and number of tests could not be found, because the variogram used has too much variability. The results are very promising and will continue to be worked out in order to produce the means of zoning and estimate for the remediation of shooting range.

2009Over the past few decades we have been experiencing an explosion of information generated by large networks of sensors and other data sources. Much of this data is intrinsically structured, such as traffic evolution in a transportation network, temperature values in different geographical locations, information diffusion in social networks, functional activities in the brain, or 3D meshes in computer graphics. The representation, analysis, and compression of such data is a challenging task and requires the development of new tools that can identify and properly exploit the data structure. In this thesis, we formulate the processing and analysis of structured data using the emerging framework of graph signal processing. Graphs are generic data representation forms, suitable for modeling the geometric structure of signals that live on topologically complicated domains. The vertices of the graph represent the discrete data domain, and the edge weights capture the pairwise relationships between the vertices. A graph signal is then defined as a function that assigns a real value to each vertex. Graph signal processing is a useful framework for handling efficiently such data as it takes into consideration both the signal and the graph structure. In this work, we develop new methods and study several important problems related to the representation and structure-aware processing of graph signals in both centralized and distributed settings. We focus in particular in the theory of sparse graph signal representation and its applications and we bring some insights towards better understanding the interplay between graphs and signals on graphs. First, we study a novel yet natural application of the graph signal processing framework for the representation of 3D point cloud sequences. We exploit graph-based transform signal representations for addressing the challenging problem of compression of data that is characterized by dynamic 3D positions and color attributes. Next, we depart from graph-based transform signal representations to design new overcomplete representations, or dictionaries, which are adapted to specific classes of graph signals. In particular, we address the problem of sparse representation of graph signals residing on weighted graphs by learning graph structured dictionaries that incorporate the intrinsic geometric structure of the irregular data domain and are adapted to the characteristics of the signals. Then, we move to the efficient processing of graph signals in distributed scenarios, such as sensor or camera networks, which brings important constraints in terms of communication and computation in realistic settings. In particular, we study the effect of quantization in the distributed processing of graph signals that are represented by graph spectral dictionaries and we show that the impact of the quantization depends on the graph geometry and on the structure of the spectral dictionaries. Finally, we focus on a widely used graph process, the problem of distributed average consensus in a sensor network where sensors exchange quantized information with their neighbors. We propose a novel quantization scheme that depends on the graph topology and exploits the increasing correlation between the values exchanged by the sensors throughout the iterations of the consensus algorithm.

Cours associés (94)

COM-500: Statistical signal and data processing through applications

Building up on the basic concepts of sampling, filtering and Fourier transforms, we address stochastic modeling, spectral analysis, estimation and prediction, classification, and adaptive filtering, with an application oriented approach and hands-on numerical exercises.

ENV-400: Air pollution and climate change

A survey course describing the origins of air pollution and climate change

MICRO-455: Applied machine learning

Real-world engineering applications must cope with a large dataset of dynamic variables, which cannot be well approximated by classical or deterministic models. This course gives an overview of methods from Machine Learning for the analysis of non-linear, highly noisy and multi dimensional data

Séances de cours associées (183)