Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance. Cluster analysis In machine learning, correlation clustering or cluster editing operates in a scenario where the relationships between the objects are known instead of the actual representations of the objects. For example, given a weighted graph where the edge weight indicates whether two nodes are similar (positive edge weight) or different (negative edge weight), the task is to find a clustering that either maximizes agreements (sum of positive edge weights within a cluster plus the absolute value of the sum of negative edge weights between clusters) or minimizes disagreements (absolute value of the sum of negative edge weights within a cluster plus the sum of positive edge weights across clusters). Unlike other clustering algorithms this does not require choosing the number of clusters in advance because the objective, to minimize the sum of weights of the cut edges, is independent of the number of clusters. It may not be possible to find a perfect clustering, where all similar items are in a cluster while all dissimilar ones are in different clusters. If the graph indeed admits a perfect clustering, then simply deleting all the negative edges and finding the connected components in the remaining graph will return the required clusters. But, in general a graph may not have a perfect clustering. For example, given nodes a,b,c such that a,b and a,c are similar while b,c are dissimilar, a perfect clustering is not possible. In such cases, the task is to find a clustering that maximizes the number of agreements (number of + edges inside clusters plus the number of − edges between clusters) or minimizes the number of disagreements (the number of − edges inside clusters plus the number of + edges between clusters).

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (27)
ENV-513: Multivariate statistics in R
Data required for ecosystem assessment is typically multidimensional. Multivariate statistical tools allow us to summarize and model multiple ecological parameters. This course provides a conceptual i
CS-455: Topics in theoretical computer science
The students gain an in-depth knowledge of several current and emerging areas of theoretical computer science. The course familiarizes them with advanced techniques, and develops an understanding of f
FIN-407: Machine learning in finance
This course aims to give an introduction to the application of machine learning to finance, focusing on the problems of portfolio optimization and hedging, as well as textual analysis. A particular fo
Show more
Related lectures (140)
Supervised Learning: k-NN and Decision Trees
Introduces supervised learning with k-NN and decision trees, covering techniques, examples, and ensemble methods.
Unsupervised Learning: Clustering
Explores unsupervised learning through clustering techniques, algorithms, applications, and challenges in various fields.
Biclustering: Networks MA448
Explores biclustering in data matrices, identifying coherent behavior patterns and discussing computational methods for analysis.
Show more
Related publications (158)

Planning urban proximities: An empirical analysis of how residential preferences conflict with the urban morphologies and residential practices

Vincent Kaufmann, Luca Giovanni Pattaroni, Marc-Edouard Baptiste Grégoire Schultheiss

Urban proximity planning is foreseen as a solution to foster a “sustainable” city, including economic viability, environmental soundness and social inclusivity. This paper focuses on the inclusivity aspects by questioning the adoption of urban proximities: ...
2024

Interpret3C: Interpretable Student Clustering Through Individualized Feature Selection

Vinitra Swamy, Paola Mejia Domenzain, Julian Thomas Blackwell, Isadora Alves de Salles

Clustering in education, particularly in large-scale online environments like MOOCs, is essential for understanding and adapting to diverse student needs. However, the effectiveness of clustering depends on its interpretability, which becomes challenging w ...
2024

Reconstructing lensless image with ML models and deploying them onto embedded systems

Jonathan Philippe Reymond

Lensless imaging provides a large panel of benefits : cost, size, weight, etc., that are crucial for wearable application, IoT or medical devices. Such setups require the design of reconstruction algorithms to recover the image from the captured measuremen ...
2023
Show more
Related concepts (1)
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.