**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Canonical correlation

Summary

In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors X = (X1, ..., Xn) and Y = (Y1, ..., Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other. T. R. Knapp notes that "virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables." The method was first introduced by Harold Hotelling in 1936, although in the context of angles between flats the mathematical concept was published by Jordan in 1875.
Given two column vectors and of random variables with finite second moments, one may define the cross-covariance to be the matrix whose entry is the covariance . In practice, we would estimate the covariance matrix based on sampled data from and (i.e. from a pair of data matrices).
Canonical-correlation analysis seeks vectors () and () such that the random variables and maximize the correlation . The (scalar) random variables and are the first pair of canonical variables. Then one seeks vectors maximizing the same correlation subject to the constraint that they are to be uncorrelated with the first pair of canonical variables; this gives the second pair of canonical variables. This procedure may be continued up to times.
Let be the cross-covariance matrix for any pair of (vector-shaped) random variables and . The target function to maximize is
The first step is to define a change of basis and define
where and can be obtained from the eigen-decomposition (or By diagonalization):
and
And thus we have
By the Cauchy–Schwarz inequality, we have
There is equality if the vectors and are collinear. In addition, the maximum of correlation is attained if is the eigenvector with the maximum eigenvalue for the matrix (see Rayleigh quotient).

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (1)

Related people (2)

Related concepts (7)

Related courses (8)

Dimensionality reduction

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable (hard to control or deal with).

Canonical correlation

In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors X = (X1, ..., Xn) and Y = (Y1, ..., Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other. T. R.

Eigenvalues and eigenvectors

In linear algebra, an eigenvector (ˈaɪgənˌvɛktər) or characteristic vector of a linear transformation is a nonzero vector that changes at most by a constant factor when that linear transformation is applied to it. The corresponding eigenvalue, often represented by , is the multiplying factor. Geometrically, a transformation matrix rotates, stretches, or shears the vectors it acts upon. The eigenvectors for a linear transformation matrix are the set of vectors that are only stretched, with no rotation or shear.

MICRO-570: Advanced machine learning

This course will present some of the core advanced methods in the field for structure discovery, classification and non-linear regression. This is an advanced class in Machine Learning; hence, student

MICRO-455: Applied machine learning

Real-world engineering applications must cope with a large dataset of dynamic variables, which cannot be well approximated by classical or deterministic models. This course gives an overview of method

BIO-369: Randomness and information in biological data

Biology is becoming more and more a data science, as illustrated by the explosion of available genome sequences. This course aims to show how we can make sense of such data and harness it in order to

Related lectures (37)

Canonical Correlation Analysis: Exercises Solutions

Presents solutions to exercises on Canonical Correlation Analysis, exploring correlation, Gram matrices, kernel matrices, and vector properties.

Dimensionality ReductionCOM-308: Internet analytics

Explores Singular Value Decomposition and Principal Component Analysis for dimensionality reduction, with applications in visualization and efficiency.

PCA: Key Concepts

Covers the key concepts of PCA, including reducing data dimensionality and extracting features, with practical exercises.

Nicolas Henri Bernard Flammarion, Xiang Cheng

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting. We propose a simple and efficient algorithm, Gen-Oja, for these problems. We prove the global convergence of our algorithm, borrowing ideas from the theory of fast-mixing Markov chains and two-time-scale stochastic approximation, showing that it achieves the optimal rate of convergence. In the process, we develop tools for understanding stochastic processes with Markovian noise which might be of independent interest.

2019