**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Covariance matrix

Summary

In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of a given random vector. Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances (i.e., the covariance of each element with itself).
Intuitively, the covariance matrix generalizes the notion of variance to multiple dimensions. As an example, the variation in a collection of random points in two-dimensional space cannot be characterized fully by a single number, nor would the variances in the x and y directions contain all of the necessary information; a 2 \times 2 matrix would be necessary to fully characterize the two-dimensional variation.
The covariance matrix of a random vector \mathbf{X} is typically denoted by \operatorname{K}_{\mathbf{X}\

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (6)

Related concepts (61)

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Multivariate normal distribution

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) no

Variance

In probability theory and statistics, variance is the squared deviation from the mean of a random variable. The variance is also often defined as the square of the standard deviation. Variance is a

Related courses (71)

CS-233(a): Introduction to machine learning (BA3)

Machine learning and data analysis are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analyzed and practically implemented.

FIN-403: Econometrics

The course covers basic econometric models and methods that are routinely applied to obtain inference results in economic and financial applications.

FIN-407: Financial econometrics

This course aims to give an introduction to the application of machine learning to finance. These techniques gained popularity due to the limitations of traditional financial econometrics methods tackling big data. We will review and compare traditional methods and machine learning algorithms.

Related publications (77)

Loading

Loading

Loading

Related units (6)

Related lectures (218)

Deep neural networks have been empirically successful in a variety of tasks, however their theoretical understanding is still poor. In particular, modern deep neural networks have many more parameters than training data. Thus, in principle they should overfit the training samples and exhibit poor generalization to the complete data distribution. Counter intuitively however, they manage to achieve both high training accuracy and high testing accuracy. One can prove generalization using a validation set, however this can be difficult when training samples are limited and at the same time we do not obtain any information about why deep neural networks generalize well. Another approach is to estimate the complexity of the deep neural network. The hypothesis is that if a network with high training accuracy has high complexity it will have memorized the data, while if it has low complexity it will have learned generalizable patterns. In the first part of this thesis we explore Spectral Complexity, a measure of complexity that depends on combinations of norms of the weight matrices of the deep neural network. For a dataset that is difficult to classify, with no underlying model and/or no recurring pattern, for example one where the labels have been chosen randomly, spectral complexity has a large value, reflecting that the network needs to memorize the labels, and will not generalize well. Putting back the real labels, the spectral complexity becomes lower reflecting that some structure is present and the network has learned patterns that might generalize to unseen data. Spectral complexity results in vacuous estimates of the generalization error (the difference between the training and testing accuracy), and we show that it can lead to counterintuitive results when comparing the generalization error of different architectures. In the second part of the thesis we explore non-vacuous estimates of the generalization error. In Chapter 2 we analyze the case of PAC-Bayes where a posterior distribution over the weights of a deep neural network is learned using stochastic variational inference, and the generalization error is the KL divergence between this posterior and a prior distribution. We find that a common approximation where the posterior is constrained to be Gaussian with diagonal covariance, known as the mean-field approximation, limits significantly any gains in bound tightness. We find that, if we choose the prior mean to be the random network initialization, the generalization error estimate tightens significantly. In Chapter 3 we explore an existing approach to learning the prior mean, in PAC-Bayes, from the training set. Specifically, we explore differential privacy, which ensures that the training samples contribute only a limited amount of information to the prior, making it distribution and not training set dependent. In this way the prior should generalize well to unseen data (as it hasn't memorized individual samples) and at the same time any posterior distribution that is close to it in terms of the KL divergence will also exhibit good generalization.

Traditional approaches to analysing functional data typically follow a two-step procedure, consisting in first smoothing and then carrying out a functional principal component analysis. The idea underlying this procedure is that functional data are well approximated by smooth functions, and that rough variations are due to noise. However, it may very well happen that localised features are rough at a global scale but still smooth at some finer scale. In this thesis we put forward a new statistical approach for functional data arising as the sum of two uncorrelated components: one smooth plus one rough. We give non-parametric conditions under which the covariance operators of the smooth and of the rough components are jointly identifiable on the basis of discretely observed data: the covariance operator corresponding to the smooth component must be of finite rank and have real analytic eigenfunctions, while the one corresponding to the rough component must have a banded covariance function. We construct consistent estimators of both covariance operators without assuming knowledge of the true rank or bandwidth. We then use them to estimate the best linear predictors of the the smooth and the rough components of each functional datum. In both the identifiability and the inference part, we do not follow the usual strategy used in functional data analysis which is to first employ smoothing and work with continuous estimate of the covariance operator. Instead, we work directly with the covariance matrix of the discretely observed data, which allows us to use results and tools from linear algebra. In fact, we show that the whole problem of uniquely recovering the covariance operator of the smooth component given the one of the raw data can be seen as a low-rank matrix completion problem, and we make great use of a classical relation between the rank and the minors of a matrix to solve this matrix completion problem. The finite-sample performance of our approach is studied by means of simulation study.

During the last twenty years, Random matrix theory (RMT) has produced numerous results that allow a better understanding of large random matrices. These advances have enabled interesting applications in the domain of communication. Although this theory can contribute to many other domains such as brain imaging or genetic research, its has been rarely applied. The main barrier to the adoption of RMT may be the lack of concrete statistical results from probabilistic Random matrix theory. Indeed, direct generalisation of classical multivariate theory to high dimensional assumptions is often difficult and the proposed procedures often assume strong hypotheses on the data matrix such as normality or overly restrictive independence conditions on the data.
This thesis proposes a statistical procedure for testing the equality of two independent estimated covariance matrices when the number of potentially dependent data vectors is large and proportional to the size of the vectors corresponding to the number of observed variables. Although the existing theory builds a very good intuition of the behaviour of these matrices, it does not provide enough results to build a satisfactory test for both the power and the robustness. Hence, inspired by spike models, we define the residual spikes and prove many theorems describing the behaviour of many statistics using eigenvectors and eigenvalues in very general cases. For example in the two central theorems of this thesis, the Invariant Angle Theorem and the Invariant Dot Product Theorem.
Using numerous generalisations of the theory, this thesis finally proposes a description of the behaviour of a statistic under a null hypothesis. This statistic allows the user to test the equality of two populations, but also other null hypotheses such as the independence of two sets of variables. Finally, the robustness of the procedure is demonstrated for different classes of models and criteria for evaluating robustness are proposed to the reader.
Therefore, the major contribution of this thesis is to propose a methodology both easy to apply and having good properties. Secondly, a large number of theoretical results are demonstrated and could be easily used to build other applications.