**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Category# Statistical classification

Summary

In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient (sex, blood pressure, presence or absence of certain symptoms, etc.).
Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or features. These properties may variously be categorical (e.g. "A", "B", "AB" or "O", for blood type), ordinal (e.g. "large", "medium" or "small"), integer-valued (e.g. the number of occurrences of a particular word in an email) or real-valued (e.g. a measurement of blood pressure). Other classifiers work by comparing observations to previous observations by means of a similarity or distance function.
An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. The term "classifier" sometimes also refers to the mathematical function, implemented by a classification algorithm, that maps input data to a category.
Terminology across fields is quite varied. In statistics, where classification is often done with logistic regression or a similar procedure, the properties of observations are termed explanatory variables (or independent variables, regressors, etc.), and the categories to be predicted are known as outcomes, which are considered to be possible values of the dependent variable. In machine learning, the observations are often known as instances, the explanatory variables are termed features (grouped into a feature vector), and the possible categories to be predicted are classes. Other fields may use different terminology: e.g. in community ecology, the term "classification" normally refers to cluster analysis.
Classification and clustering are examples of the more general problem of pattern recognition, which is the assignment of some sort of output value to a given input value.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related courses (2)

Related publications (6)

Related concepts (20)

CS-233(a): Introduction to machine learning (BA3)

Machine learning and data analysis are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analy

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

Prevalence

In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number of people found to have the condition with the total number of people studied and is usually expressed as a fraction, a percentage, or the number of cases per 10,000 or 100,000 people. Prevalence is most often used in questionnaire studies. Incidence (epidemiology)#Incidence vs.

Precision and recall

In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances. Written as a formula:. Recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Written as a formula: . Both precision and recall are therefore based on relevance.

Cohen's kappa

Cohen's kappa coefficient (κ, lowercase Greek kappa) is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance. There is controversy surrounding Cohen's kappa due to the difficulty in interpreting indices of agreement.

Related categories (58)

Related lectures (27)

Topics in pathology

Pathology is the study of the causes and effects of disease or injury. The word pathology also refers to the study of disease in general, incorporating a wide range of biology research fields and medical practices. However, when used in the context of modern medical treatment, the term is often used in a narrower fashion to refer to processes and tests that fall within the contemporary medical field of "general pathology", an area which includes a number of distinct but inter-related medical specialties that diagnose disease, mostly through analysis of tissue and human cell samples.

Statistical hypothesis testing

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. While hypothesis testing was popularized early in the 20th century, early forms were used in the 1700s. The first use is credited to John Arbuthnot (1710), followed by Pierre-Simon Laplace (1770s), in analyzing the human sex ratio at birth; see .

Mathematical statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory. Statistical data collection is concerned with the planning of studies, especially with the design of randomized experiments and with the planning of surveys using random sampling.

Generalized Linear Regression: ClassificationBIO-322: Introduction to machine learning for bioengineers

Explores Generalized Linear Regression, Classification, confusion matrices, ROC curves, and noise in data.

Supervised Learning Essentials

Introduces the basics of supervised learning, focusing on logistic regression, linear classification, and likelihood maximization.

Evaluation ProtocolsEE-311: Fundamentals of machine learning

Explores evaluation protocols in machine learning, including recall, precision, accuracy, and specificity, with real-world examples like COVID-19 testing.

The performance of myoelectric control highly depends on the features extracted from surface electromyographic (sEMG) signals. We propose three new sEMG features based on the kernel density estimation

, ,

Hardware accelerators based on two-terminal non-volatile memories (NVMs) can potentially provide competitive speed and accuracy for the training of fully connected deep neural networks (FC-DNNs), with

2019We numerically study the resistive method for the numerical approximation of elliptic PDEs. In particular we focus on the resistive method for weakly setting solution values in specific subdomains or

2016