**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Concept# Phi coefficient

Summary

In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975. Introduced by Karl Pearson, and also known as the Yule phi coefficient from its introduction by Udny Yule in 1912 this measure is similar to the Pearson correlation coefficient in its interpretation. In fact, a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient. Two binary variables are considered positively associated if most of the data falls along the diagonal cells. In contrast, two binary variables are considered negatively associated if most of the data falls off the diagonal. If we have a 2×2 table for two random variables x and y
where n11, n10, n01, n00, are non-negative counts of numbers of observations that sum to n, the total number of observations. The phi coefficient that describes the association of x and y is
Phi is related to the point-biserial correlation coefficient and Cohen's d and estimates the extent of the relationship between two variables (2×2).
The phi coefficient can also be expressed using only , , , and , as
Although computationally the Pearson correlation coefficient reduces to the phi coefficient in the 2×2 case, they are not in general the same. The Pearson correlation coefficient ranges from −1 to +1, where ±1 indicates perfect agreement or disagreement, and 0 indicates no relationship. The phi coefficient has a maximum value that is determined by the distribution of the two variables if one or both variables can take on more than two values. See Davenport and El-Sanhury (1991) for a thorough discussion.
The MCC is defined identically to phi coefficient, introduced by Karl Pearson, also known as the Yule phi coefficient from its introduction by Udny Yule in 1912.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related people (1)

Related concepts (6)

Related publications (33)

Related courses (15)

Related lectures (33)

Precision and recall

In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances. Written as a formula:. Recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Written as a formula: . Both precision and recall are therefore based on relevance.

F-score

In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive.

Binary classification

Binary classification is the task of classifying the elements of a set into two groups (each called class) on the basis of a classification rule. Typical binary classification problems include: Medical testing to determine if a patient has certain disease or not; Quality control in industry, deciding whether a specification has been met; In information retrieval, deciding whether a page should be in the result set of a search or not. Binary classification is dichotomization applied to a practical situation.

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

ENG-209: Data science for engineers with Python

Ce cours est divisé en deux partie. La première partie présente le langage Python et les différences notables entre Python et C++ (utilisé dans le cours précédent ICC). La seconde partie est une intro

EE-311: Fundamentals of machine learning

Ce cours présente une vue générale des techniques d'apprentissage automatique, passant en revue les algorithmes, le formalisme théorique et les protocoles expérimentaux.

Generalized Linear Regression: Classification

Explores Generalized Linear Regression, Classification, confusion matrices, ROC curves, and noise in data.

Supervised Learning Essentials

Introduces the basics of supervised learning, focusing on logistic regression, linear classification, and likelihood maximization.

Evaluation Protocols

Explores evaluation protocols in machine learning, including recall, precision, accuracy, and specificity, with real-world examples like COVID-19 testing.

Jean-Paul Richard Kneib, Emma Elizabeth Tolley, Tianyue Chen, Michele Bianco

The upcoming Square Kilometre Array Observatory will produce images of neutral hydrogen distribution during the epoch of reionization by observing the corresponding 21-cm signal. However, the 21-cm signal will be subject to instrumental limitations such as ...

Marilyne Andersen, Sabine Süsstrunk, Caroline Karmann, Bahar Aydemir, Kynthia Chamilothori, Seungryong Kim

Saliency models are image-based prediction models that estimate human visual attention. Such models, when applied to architectural spaces, could pave the way for design decisions where visual attention is taken into account. In this study, we tested the pe ...

2023Volkan Cevher, Grigorios Chrysos, Thomas Michaelsen Pethick

Despite progress in adversarial training (AT), there is a substantial gap between the topperforming and worst-performing classes in many datasets. For example, on CIFAR10, the accuracies for the best and worst classes are 74% and 23%, respectively. We argu ...

2023