In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975. Introduced by Karl Pearson, and also known as the Yule phi coefficient from its introduction by Udny Yule in 1912 this measure is similar to the Pearson correlation coefficient in its interpretation. In fact, a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient. Two binary variables are considered positively associated if most of the data falls along the diagonal cells. In contrast, two binary variables are considered negatively associated if most of the data falls off the diagonal. If we have a 2×2 table for two random variables x and y where n11, n10, n01, n00, are non-negative counts of numbers of observations that sum to n, the total number of observations. The phi coefficient that describes the association of x and y is Phi is related to the point-biserial correlation coefficient and Cohen's d and estimates the extent of the relationship between two variables (2×2). The phi coefficient can also be expressed using only , , , and , as Although computationally the Pearson correlation coefficient reduces to the phi coefficient in the 2×2 case, they are not in general the same. The Pearson correlation coefficient ranges from −1 to +1, where ±1 indicates perfect agreement or disagreement, and 0 indicates no relationship. The phi coefficient has a maximum value that is determined by the distribution of the two variables if one or both variables can take on more than two values. See Davenport and El-Sanhury (1991) for a thorough discussion. The MCC is defined identically to phi coefficient, introduced by Karl Pearson, also known as the Yule phi coefficient from its introduction by Udny Yule in 1912.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (15)
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
ENG-209: Data science for engineers with Python
Ce cours est divisé en deux partie. La première partie présente le langage Python et les différences notables entre Python et C++ (utilisé dans le cours précédent ICC). La seconde partie est une intro
EE-311: Fundamentals of machine learning
Ce cours présente une vue générale des techniques d'apprentissage automatique, passant en revue les algorithmes, le formalisme théorique et les protocoles expérimentaux.
Show more
Related lectures (32)
Introduction to Human Rights
Introduces key human rights concepts, explores discrimination, social norms, and the impact of algorithms on rights.
Introduction to Data Science
Introduces the basics of data science, covering decision trees, machine learning advancements, and deep reinforcement learning.
Quantifying Performance: Misclassification and F-Measure
Covers quantifying performance through true positives, false negatives, and false positives in machine learning.
Show more
Related publications (33)

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Jean-Paul Richard Kneib, Emma Elizabeth Tolley, Tianyue Chen, Michele Bianco

The upcoming Square Kilometre Array Observatory will produce images of neutral hydrogen distribution during the epoch of reionization by observing the corresponding 21-cm signal. However, the 21-cm signal will be subject to instrumental limitations such as ...
Oxford Univ Press2024

Revisiting adversarial training for the worst-performing class

Volkan Cevher, Grigorios Chrysos, Thomas Michaelsen Pethick

Despite progress in adversarial training (AT), there is a substantial gap between the topperforming and worst-performing classes in many datasets. For example, on CIFAR10, the accuracies for the best and worst classes are 74% and 23%, respectively. We argu ...
2023

Saliency prediction in 360° architectural scenes: Performance and impact of daylight variations

Marilyne Andersen, Sabine Süsstrunk, Caroline Karmann, Bahar Aydemir, Kynthia Chamilothori, Seungryong Kim

Saliency models are image-based prediction models that estimate human visual attention. Such models, when applied to architectural spaces, could pave the way for design decisions where visual attention is taken into account. In this study, we tested the pe ...
2023
Show more
Related concepts (6)
Precision and recall
In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances. Written as a formula:. Recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Written as a formula: . Both precision and recall are therefore based on relevance.
F-score
In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive.
Binary classification
Binary classification is the task of classifying the elements of a set into two groups (each called class) on the basis of a classification rule. Typical binary classification problems include: Medical testing to determine if a patient has certain disease or not; Quality control in industry, deciding whether a specification has been met; In information retrieval, deciding whether a page should be in the result set of a search or not. Binary classification is dichotomization applied to a practical situation.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.