**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Pearson correlation coefficient

Summary

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1 (as 1 would represent an unrealistically perfect correlation).
It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s, and for which the mathematical formula was derived and published by Auguste Bravais in 1844. The naming of the coefficient is thus an example of Stigler's Law.
Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. The form of the definition involves a "product moment", that is, the mean (the first moment about the origin) of the product of the mean-adjusted random variables; hence the modifier product-moment in the name.
Pearson's correlation coefficient, when applied to a population, is commonly represented by the Greek letter ρ (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient. Given a pair of random variables (for example, Height and Weight), the formula for ρ is
where
is the covariance
is the standard deviation of
is the standard deviation of .
The formula for can be expressed in terms of mean and expectation. Since
the formula for can also be written as
where
and are defined as above
is the mean of
is the mean of
is the expectation.
The formula for can be expressed in terms of uncentered moments.

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related people (5)

Related publications (10)

Related courses (86)

Related MOOCs (1)

Related concepts (81)

Related units (2)

Cavity Quantum Optomechanics

Fundamentals of optomechanics. Basic principles, recent developments and applications.

PHYS-739: Conformal Field theory and Gravity

This course is an introduction to the non-perturbative bootstrap approach to Conformal Field Theory and to the Gauge/Gravity duality, emphasizing the fruitful interplay between these two ideas.

PHYS-316: Statistical physics II

Introduction à la théorie des transitions de phase

COM-500: Statistical signal and data processing through applications

Building up on the basic concepts of sampling, filtering and Fourier transforms, we address stochastic modeling, spectral analysis, estimation and prediction, classification, and adaptive filtering, w

Karl Pearson

Karl Pearson (ˈpɪərsən; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university statistics department at University College London in 1911, and contributed significantly to the field of biometrics and meteorology. Pearson was also a proponent of social Darwinism and eugenics, and his thought is an example of what is today described as scientific racism.

Linear regression

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Uncorrelatedness (probability theory)

In probability theory and statistics, two real-valued random variables, , , are said to be uncorrelated if their covariance, , is zero. If two variables are uncorrelated, there is no linear relationship between them. Uncorrelated random variables have a Pearson correlation coefficient, when it exists, of zero, except in the trivial case when either variable has zero variance (is a constant). In this case the correlation is undefined.

Henrik Moodysson Rønnow, Thorbjørn Skovhus

We present first-principles calculations of the dynamic susceptibility in strained and doped ferromagnetic MnBi using time-dependent density functional theory. In spite of being a metal, MnBi exhibits signatures of strong correlation and a proper description in the framework of density functional theory requires Hubbard corrections to the Mn d orbitals. To permit calculations of the dynamic susceptibility with Hubbard corrections applied to the ground-state electronic structure, we use a consistent rescaling of the exchange-correlation kernel maintaining the delicate balance between the magnon dispersion and the Stoner continuum. We find excellent agreement with the experimentally observed magnon dispersion for pristine MnBi and show that the material undergoes a phase transition to helical order under application of either doping or strain. The presented methodology paves the way for future linear response time-dependent density functional theory studies of magnetic phase transitions, also for the wide range of materials with pronounced static correlation effects that are not accounted for at the local density approximation level.

The modeling of the probability of joint default or total number of defaults among the firms is one of the crucial problems to mitigate the credit risk since the default correlations significantly affect the portfolio loss distribution and hence play a significant role in allocating capital for solvency purposes. In this article, we derive a closed-form expression for the default probability of a single firm and probability of the total number of defaults by time $t$ in a homogeneous portfolio. We use a contagion process to model the arrival of credit events causing the default and develop a framework that allows firms to have resistance against default unlike the standard intensity-based models. We assume the point process driving the credit events is composed of a systematic and an idiosyncratic component, whose intensities are independently specified by a mean-reverting affine jump-diffusion process with self-exciting jumps. The proposed framework is competent of capturing the feedback effect. We further demonstrate how the proposed framework can be used to price synthetic collateralized debt obligation (CDO). Finally, we present the sensitivity analysis to demonstrate the effect of different parameters governing the contagion effect on the spread of tranches and the expected loss of the CDO.

Related lectures (770)

Arash Amini, Hatef Otroshi Shahreza

With the emergence of social networks and improvements in the internet speed, the video data has become an ever-increasing portion of the global internet traffic. Besides the content, the quality of a video sequence is an important issue at the user end which is often affected by various factors such as compression. Therefore, monitoring the quality is crucial for the video content and service providers. A simple monitoring approach is to compare the raw video content (uncompressed) with the received data at the receiver. In most practical scenarios, however, the reference video sequence is not available. Consequently, it is desirable to have a general reference-less method for assessing the perceived quality of any given video sequence. In this paper, a no-reference video quality assessment technique based on video features is proposed. In particular, a long list of video features (21 sets of features, each consisting of 1 to 216 features) is considered and all possible combinations (2(21) - 1) for training an Extra Trees regressor is examined. This choice of the regressor is wisely selected and is observed to perform better than other common regressors. The results reveal that the top 20 performing feature subsets all outperformthe existing featurebased assessment methods in terms of the Pearson linear correlation coefficient (PLCC) or the Spearman rank order correlation coefficient (SROCC). Specially, the best performing regressor achieves PLCC = 0.786 on the test data over the KonVid-1k dataset. It is believed that the results of the comprehensive comparison could be potentially useful for other feature-based video-related problems. The source codes of the implementations are publicly available.

Determinantal Point Processes and Extrapolation

Covers determinantal point processes, sine-process, and their extrapolation in different spaces.

Critical Behavior in General RelativityPHYS-316: Statistical physics II

Explores critical behavior in general relativity, including scaling factor and coupling constant flow.