Publication

Testing Graph Clusterability: Algorithms and Lower Bounds

Abstract

We consider the problem of testing graph cluster structure: given access to a graph G = (V, E), can we quickly determine whether the graph can be partitioned into a few clusters with good inner conductance, or is far from any such graph? This is a generalization of the well-studied problem of testing graph expansion, where one wants to distinguish between the graph having good expansion (i.e. being a good single cluster) and the graph having a sparse cut (i.e. being a union of at least two clusters). A recent work of Czumaj, Peng, and Sohler (STOC'15) gave an ingenious sublinear time algorithm for testing k-clusterability in time (O) over tilde (n(1/2)poly(k)). Their algorithm implicitly embeds a random sample of vertices of the graph into Euclidean space, and then clusters the samples based on estimates of Euclidean distances between the points. This yields a very efficient testing algorithm, but only works if the cluster structure is very strong: it is necessary to assume that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph G. In this paper we show how one can leverage more refined geometric information, namely angles as opposed to distances, to obtain a sublinear time tester that works even when the gap is a sufficiently large constant. Our tester is based on the singular value decomposition of a natural matrix derived from random walk transition probabilities from a small sample of seed nodes. We complement our algorithm with a matching lower bound on the query complexity of testing clusterability. Our lower bound is based on a novel property testing problem, which we analyze using Fourier analytic tools. As a byproduct of our techniques, we also achieve new lower bounds for the problem of approximating MAX-CUT value in sublinear time.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (35)
Graph theory
In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices (also called nodes or points) which are connected by edges (also called links or lines). A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.
Cluster analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
Simple random sample
In statistics, a simple random sample (or SRS) is a subset of individuals (a sample) chosen from a larger set (a population) in which a subset of individuals are chosen randomly, all with the same probability. It is a process of selecting a sample in a random way. In SRS, each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. A simple random sample is an unbiased sampling technique. Simple random sampling is a basic type of sampling and can be a component of other more complex sampling methods.
Show more
Related publications (74)

Results on Sparse Integer Programming and Geometric Independent Sets

Jana Tabea Cslovjecsek

An integer linear program is a problem of the form max{c^T x : Ax=b, x >= 0, x integer}, where A is in Z^(n x m), b in Z^m, and c in Z^n.Solving an integer linear program is NP-hard in general, but there are several assumptions for which it becomes fixed p ...
EPFL2023

Maximum Independent Set: Self-Training through Dynamic Programming

Volkan Cevher, Grigorios Chrysos, Efstratios Panteleimon Skoulakis

This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly construc ...
2023

Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

Rachid Guerraoui, Anne-Marie Kermarrec, Sadegh Farhadkhani, Rafael Pereira Pires, Rishi Sharma, Marinus Abraham de Vos

We present Epidemic Learning ( EL ), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sen ...
2023
Show more
Related MOOCs (32)
Analyse I
Le contenu de ce cours correspond à celui du cours d'Analyse I, comme il est enseigné pour les étudiantes et les étudiants de l'EPFL pendant leur premier semestre. Chaque chapitre du cours correspond
Analyse I (partie 1) : Prélude, notions de base, les nombres réels
Concepts de base de l'analyse réelle et introduction aux nombres réels.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.