**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Sample mean and covariance

Summary

The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables.
The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales. The sample mean is used as an estimator for the population mean, the average value in the entire population, where the estimate is more likely to be close to the population mean if the sample is large and representative. The reliability of the sample mean is estimated using the standard error, which in turn is calculated using the variance of the sample. If the sample is random, the standard error falls with the size of the sample and the sample mean's distribution approaches the normal distribution as the sample size increases.
The term "sample mean" can also be used to refer to a vector of average values when the statistician is looking at the values of several variables in the sample, e.g. the sales, profits, and employees of a sample of Fortune 500 companies. In this case, there is not just a sample variance for each variable but a sample variance-covariance matrix (or simply covariance matrix) showing also the relationship between each pair of variables. This would be a 3×3 matrix when 3 variables are being considered. The sample covariance is useful in judging the reliability of the sample means as estimators and is also useful as an estimate of the population covariance matrix.
Due to their ease of calculation and other desirable characteristics, the sample mean and sample covariance are widely used in statistics to represent the location and dispersion of the distribution of values in the sample, and to estimate the values for the population.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related courses (71)

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

MICRO-110: Design of experiments

This course provides an introduction to experimental statistics, including use of population statistics to characterize experimental results, use of comparison statistics and hypothesis testing to eva

PHYS-467: Machine learning for physicists

Machine learning and data analysis are becoming increasingly central in sciences including physics. In this course, fundamental principles and methods of machine learning will be introduced and practi

Related concepts (60)

Standard score

In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores. It is calculated by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation.

Standard error

The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error of the mean (SEM). The sampling distribution of a mean is generated by repeated sampling from the same population and recording of the sample means obtained. This forms a distribution of different means, and this distribution has its own mean and variance.

Truncated mean

A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points. For most statistical applications, 5 to 25 percent of the ends are discarded.

Related lectures (709)

Power Method

Covers the Power Method for approximating the largest eigenvalue of a matrix.

Numerical Methods: Euler and Crank-Nicolson

Covers Euler and Crank-Nicolson methods for solving differential equations.

System of ODEs: High Order ODEs

Covers high order ODEs, numerical methods, and stability criteria.