Publication

Fairness and Explainability in Clustering Problems

Xinrui Jia
2023
EPFL thesis
Abstract

In this thesis we present and analyze approximation algorithms for three different clustering problems. The formulations of these problems are motivated by fairness and explainability considerations, two issues that have recently received attention in the algorithms and machine learning communities. In general, we are given a metric space and the task is to find a specified number of clusters to minimize some objective involving the centers of the clusters.Our first problem is the colorful k-center problem. In the classic k-center with outliers problem, the objective is to minimize the maximum distance from any point to its nearest center, while a given number of points can be omitted from contributing to the maximum. That is, these points are not served by any center. Colorful k-center is a generalization of this problem. Instead of only serving enough points, each point belongs to some color class and a certain number of points of each color are required to be served in a solution. When this problem was first introduced by [Ban+19], a pseudo-approximation using k + omega - 1 centers was given for general metric spaces where omega is the number of color classes. We give the first true approximation algorithm with k centers that gives a 3-approximation for colorful k-center.Next, we present our progress on the non-uniform k-center problem (NUkC or t-NUkC). In NUkC, we are given pairs (ki, ri), 1

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (39)
K-means clustering
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
Correlation clustering
Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance. Cluster analysis In machine learning, correlation clustering or cluster editing operates in a scenario where the relationships between the objects are known instead of the actual representations of the objects.
Show more
Related publications (85)

Interpret3C: Interpretable Student Clustering Through Individualized Feature Selection

Vinitra Swamy, Paola Mejia Domenzain, Julian Thomas Blackwell, Isadora Alves de Salles

Clustering in education, particularly in large-scale online environments like MOOCs, is essential for understanding and adapting to diverse student needs. However, the effectiveness of clustering depends on its interpretability, which becomes challenging w ...
2024

Augmented Lagrangian Methods for Provable and Scalable Machine Learning

Mehmet Fatih Sahin

Non-convex constrained optimization problems have become a powerful framework for modeling a wide range of machine learning problems, with applications in k-means clustering, large- scale semidefinite programs (SDPs), and various other tasks. As the perfor ...
EPFL2023

Unsupervised Graph Representation Learning with Cluster-aware Self-training and Refining

Yichen Xu, Qiang Liu, Feng Yu

Unsupervised graph representation learning aims to learn low-dimensional node embeddings without supervision while preserving graph topological structures and node attributive features. Previous Graph Neural Networks (GNN) require a large number of labeled ...
New York2023
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.