**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Sublinear Algorithms for Spectral Graph Clustering

Abstract

This thesis focuses on designing spectral tools for graph clustering in sublinear time. With the emergence of big data, many traditional polynomial time, and even linear time algorithms have become prohibitively expensive. Processing modern datasets requires a new set of algorithms for computing with extremely constrained resources, i.e., \emph{sublinear algorithms}. Clustering is one of the well-known techniques for solving large-scale optimization problems in a wide variety of domains, including machine learning, data science and graph analysis~\cite{aydin2016distributed, rolnick2016geocuts, gargi2011large}. Efficient sublinear solutions for fundamental graph clustering problems require going well beyond classic techniques. In this thesis, we present an \emph{optimal} sublinear-time algorithm for \textit{testing $k$-clusterability problem}, i.e., quickly determining whether the graph can be partitioned into at most $k$ expanders, or is far from any such graph. This is a generalization of a well-studied problem of testing graph expansion. The classic results on testing $k$-clusterability either consider the testing expansion problem (i.e, $k=1$ vs $k\geq 2$) \cite{KaleS_SIAMJC11,NachmiasS10}, or address the problem for larger values of $k$ under the assumption that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph \cite{CzumajPS_STOC15}. We overcome these barriers by developing novel spectral techniques based on analyzing the spectrum of the Gram matrix ofrandom walk transition probabilities. We complement our algorithm with a matching lower bound on the query complexity of testing $k$-clusterability, which improves upon the long-standing previous lower bound for testing graph expansion.Furthermore, we extend our previous result from the \textit{property testing} framework to an efficient clustering algorithm in the \textit{local computation algorithm} (LCA) model. We focus on a popular variant of graph clustering where the input graph can be partitioned into $k$ expanders with outer conductance bounded by $\epsilon$. We construct a small space data structure that allows quickly classifying vertices of $G$ according to the cluster they belong to in sublinear time. Our spectral clustering oracle provides $O(\epsilon \log k)$ error per cluster for any $\epsilon \ll 1/\log k$. Our main contribution is a sublinear time oracle that provides dot product access to the spectral embedding of the graph. We estimate dot products with high precision using an appropriate linear transformation of the Gram matrix of random walk transition probabilities. Finally, using dot product access to the spectral embedding we design a spectral clustering oracle. At a high level, our approach amounts to hyperplane partitioning in the spectral embedding of the graph but crucially operates on a nested sequence of carefully defined subspaces in the spectral embedding to achieve per cluster recovery guarantees.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related MOOCs

Loading

Related concepts (15)

Related publications (7)

Related MOOCs (7)

Algorithm

In mathematics and computer science, an algorithm (ˈælɡərɪðəm) is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes (referred to as automated decision-making) and deduce valid inferences (referred to as automated reasoning), achieving automation eventually.

Time complexity

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

Sequence

In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called elements, or terms). The number of elements (possibly infinite) is called the length of the sequence. Unlike a set, the same elements can appear multiple times at different positions in a sequence, and unlike a set, the order does matter. Formally, a sequence can be defined as a function from natural numbers (the positions of elements in the sequence) to the elements at each position.

IoT Systems and Industrial Applications with Design Thinking

The first MOOC to provide a comprehensive introduction to Internet of Things (IoT) including the fundamental business aspects needed to define IoT related products.

Selected Topics on Discrete Choice

Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t

Selected Topics on Discrete Choice

Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t

Loading

Loading

Loading

Mikhail Kapralov, Jakab Tardos, Navid Nouri, Aidasadat Mousavifar

In this paper, we resolve the complexity problem of spectral graph sparcification in dynamic streams up to polylogarithmic factors. Using a linear sketch we design a streaming algorithm that uses (O)

This thesis focuses on the maximum matching problem in modern computational settings where the algorithms have to make decisions with partial information.First, we consider two stochastic models calle

Mikhail Kapralov, Aidasadat Mousavifar

We consider the problem of testing graph cluster structure: given access to a graph G = (V, E), can we quickly determine whether the graph can be partitioned into a few clusters with good inner conduc