**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Quantifying Statistical Dependence: Covariance and Correlation

Description

This lecture covers the concepts of covariance and correlation, focusing on quantifying statistical dependence between random variables. It explains how covariance measures the linear relationship between two variables, while correlation standardizes this measure to a range between -1 and 1. The lecture also delves into the calculation of covariance and correlation coefficients, emphasizing their significance in understanding the relationship between data points. Additionally, it explores the notion of mutual information as a measure of the amount of information obtained about one random variable through another. The discussion extends to identifying coevolving sites in interacting proteins using sequence data, showcasing practical applications of these statistical concepts.

Login to watch the video

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

Instructor

BIO-369: Randomness and information in biological data

Biology is becoming more and more a data science, as illustrated by the explosion of available genome sequences. This course aims to show how we can make sense of such data and harness it in order to

Related concepts (117)

Related lectures (145)

Eigenstate Thermalization Hypothesis

Explores the Eigenstate Thermalization Hypothesis in quantum systems, emphasizing the random matrix theory and the behavior of observables in thermal equilibrium.

Information Measures: Entropy and Information Theory

Explains how entropy measures uncertainty in a system based on possible outcomes.

Principal Component Analysis: Properties and Applications

Explores Principal Component Analysis theory, properties, applications, and hypothesis testing in multivariate statistics.

Modes of Convergence of Random Variables

Covers the modes of convergence of random variables and the Central Limit Theorem, discussing implications and approximations.

Dependence and Correlation

Explores dependence, correlation, and conditional expectations in probability and statistics, highlighting their significance and limitations.

Random variable

A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as it is not actually random nor a variable, but rather it is a function from possible outcomes (e.g., the possible upper sides of a flipped coin such as heads and tails ) in a sample space (e.g., the set ) to a measurable space (e.g., in which 1 corresponding to and −1 corresponding to ), often to the real numbers.

Correlation

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

Exchangeable random variables

In statistics, an exchangeable sequence of random variables (also sometimes interchangeable) is a sequence X1, X2, X3, ... (which may be finitely or infinitely long) whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. Thus, for example the sequences both have the same joint probability distribution. It is closely related to the use of independent and identically distributed random variables in statistical models.

Independent and identically distributed random variables

In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as i.i.d., iid, or IID. IID was first defined in statistics and finds application in different fields such as data mining and signal processing. Statistics commonly deals with random samples. A random sample can be thought of as a set of objects that are chosen randomly.

Convergence of random variables

In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behavior that is essentially unchanging when items far enough into the sequence are studied.