**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Sublinear Algorithms for Spectral Graph Clustering

Résumé

This thesis focuses on designing spectral tools for graph clustering in sublinear time. With the emergence of big data, many traditional polynomial time, and even linear time algorithms have become prohibitively expensive. Processing modern datasets requires a new set of algorithms for computing with extremely constrained resources, i.e., \emph{sublinear algorithms}. Clustering is one of the well-known techniques for solving large-scale optimization problems in a wide variety of domains, including machine learning, data science and graph analysis~\cite{aydin2016distributed, rolnick2016geocuts, gargi2011large}. Efficient sublinear solutions for fundamental graph clustering problems require going well beyond classic techniques. In this thesis, we present an \emph{optimal} sublinear-time algorithm for \textit{testing $k$-clusterability problem}, i.e., quickly determining whether the graph can be partitioned into at most $k$ expanders, or is far from any such graph. This is a generalization of a well-studied problem of testing graph expansion. The classic results on testing $k$-clusterability either consider the testing expansion problem (i.e, $k=1$ vs $k\geq 2$) \cite{KaleS_SIAMJC11,NachmiasS10}, or address the problem for larger values of $k$ under the assumption that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph \cite{CzumajPS_STOC15}. We overcome these barriers by developing novel spectral techniques based on analyzing the spectrum of the Gram matrix ofrandom walk transition probabilities. We complement our algorithm with a matching lower bound on the query complexity of testing $k$-clusterability, which improves upon the long-standing previous lower bound for testing graph expansion.Furthermore, we extend our previous result from the \textit{property testing} framework to an efficient clustering algorithm in the \textit{local computation algorithm} (LCA) model. We focus on a popular variant of graph clustering where the input graph can be partitioned into $k$ expanders with outer conductance bounded by $\epsilon$. We construct a small space data structure that allows quickly classifying vertices of $G$ according to the cluster they belong to in sublinear time. Our spectral clustering oracle provides $O(\epsilon \log k)$ error per cluster for any $\epsilon \ll 1/\log k$. Our main contribution is a sublinear time oracle that provides dot product access to the spectral embedding of the graph. We estimate dot products with high precision using an appropriate linear transformation of the Gram matrix of random walk transition probabilities. Finally, using dot product access to the spectral embedding we design a spectral clustering oracle. At a high level, our approach amounts to hyperplane partitioning in the spectral embedding of the graph but crucially operates on a nested sequence of carefully defined subspaces in the spectral embedding to achieve per cluster recovery guarantees.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (15)

Algorithme

thumb|Algorithme de découpe d'un polygone quelconque en triangles (triangulation).
Un algorithme est une suite finie et non ambiguë d'instructions et d’opérations permettant de résoudre une classe de

Complexité en temps

En algorithmique, la complexité en temps est une mesure du temps utilisé par un algorithme, exprimé comme fonction de la taille de l'entrée. Le temps compte le nombre d'étapes de calcul avant d'arrive

Suite (mathématiques)

vignette|Exemple de suite : les points bleus représentent ses termes.
En mathématiques, une suite est une famille d'éléments — appelés ses « termes » — indexée par les entiers naturels. Une suite fini

Publications associées (37)

Chargement

Chargement

Chargement

Ashish Hari Chiplunkar, Mikhail Kapralov, Aidasadat Mousavifar

We consider the problem of testing graph cluster structure: given access to a graph G = (V, E), can we quickly determine whether the graph can be partitioned into a few clusters with good inner conductance, or is far from any such graph? This is a generalization of the well-studied problem of testing graph expansion, where one wants to distinguish between the graph having good expansion (i.e. being a good single cluster) and the graph having a sparse cut (i.e. being a union of at least two clusters). A recent work of Czumaj, Peng, and Sohler (STOC'15) gave an ingenious sublinear time algorithm for testing k-clusterability in time (O) over tilde (n(1/2)poly(k)). Their algorithm implicitly embeds a random sample of vertices of the graph into Euclidean space, and then clusters the samples based on estimates of Euclidean distances between the points. This yields a very efficient testing algorithm, but only works if the cluster structure is very strong: it is necessary to assume that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph G. In this paper we show how one can leverage more refined geometric information, namely angles as opposed to distances, to obtain a sublinear time tester that works even when the gap is a sufficiently large constant. Our tester is based on the singular value decomposition of a natural matrix derived from random walk transition probabilities from a small sample of seed nodes. We complement our algorithm with a matching lower bound on the query complexity of testing clusterability. Our lower bound is based on a novel property testing problem, which we analyze using Fourier analytic tools. As a byproduct of our techniques, we also achieve new lower bounds for the problem of approximating MAX-CUT value in sublinear time.

In this thesis we give new algorithms for two fundamental graph problems. We develop novel ways of using linear programming formulations, even exponential-sized ones, to extract structure from problem instances and to guide algorithms in making progress. Somewhat surprisingly, similar polyhedral techniques can be harnessed in the two seemingly disparate settings.
In the first part of the thesis we address a benchmark problem in combinatorial optimization: the asymmetric traveling salesman problem (ATSP). It consists in finding the shortest tour that visits all vertices of a given directed graph with weights on edges. Due to its NP-hardness, the theoretical study of algorithms for ATSP has focused on approximation algorithms: ones that are provably both efficient and give solutions competitive with the optimum. Specifically, a rho-approximation algorithm for ATSP is one that runs in polynomial time and always outputs a tour that is at most rho times longer than the shortest tour. Finding such an approximation algorithm with rho bounded (i.e., a constant factor) had been a long-standing open problem.
In this thesis, we give such an algorithm. Our approximation guarantee is analyzed with respect to the standard linear programming relaxation, and thus our result also confirms the conjectured constant integrality gap of that relaxation. Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics due to Svensson. In particular, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. This reduction takes advantage of a laminar family of vertex sets that arises from the linear programming relaxation.
In the second part of the thesis we address the perfect matching problem. The first polynomial-time algorithm for it, given by Edmonds in 1965, is historically associated with the introduction of the class P and our notion that

`polynomial-time'' means `

efficient''. That algorithm is sequential and deterministic. We have also known since the 1980s that the matching problem has efficient parallel algorithms if the use of randomness is allowed. Formally, it is in the class RNC, i.e., it has randomized algorithms that use polynomially many processors and run in polylogarithmic time. However, we do not know if randomness is necessary - that is, whether the matching problem is in the class NC.
In this thesis we show that the matching problem is in quasi-NC. That is, we give a deterministic parallel algorithm that runs in O(log^3 n) time on n^{O(log^2 n)} processors. The result is obtained by a derandomization of the Isolation Lemma for perfect matchings, which was introduced in the classic paper by Mulmuley, Vazirani and Vazirani to obtain an RNC algorithm. Our proof extends the framework of Fenner, Gurjar and Thierauf, who proved the analogous result in the special case of bipartite graphs. Compared to that setting, several new ingredients are needed due to the significantly more complex structure of perfect matchings in general graphs. In particular, our proof heavily relies on the laminar structure of the faces of the perfect matching polytope.Graph theory experienced a remarkable increase of interest among the scientific community during the last decades. The vertex coloring problem (Min Coloring) deserves a particular attention rince it has been able to capture a wide variety of applications. For mathematicians, it is interesting for an additional reason: it is extremely hard to solve it in an efficient way. In this thesis, we introduce several problems generalizing the usual vertex coloring problem, and hence, extending its application domain. We say that a graph is (p, k)-colorable if its vertex set can be partitioned into p cliques and k stable sets. Then, for a given p (respectively k), one may ask the following questions: how to choose p cliques (respectively k stable sets) to be removed from the graph such that the number of stable sets (respectively cliques) partitioning the remaining vertices is minimized? These are called (p, k)-coloring problems. We also introduce Min Split-coloring which is, given a graph G, the problem of minimizing k such that G is (k, k)-colorable. Along the saine line, given a graph G, the objective of the problem Min Cocoloring is to minimize p + k such that G is (p, k)-colorable. All these problems, called together generalized coloring problems, are obviously at least as difficult as Min Coloring. The purpose of this dissertation is to study generalized coloring problems in nome restricted classes of graphs in order to bring a new insight on the relative difficulties of these problems. To this end, we detect in a more precise way the limits between NP-hard and polynomially solvable problems. Chapter 1 introduces generalized coloring problems by emphasizing nome preliminary results which will guide the questions to handle in the following chapters. Chapter 2 exposes the first clans of graphs, namely cacti, where Min Split-coloring is shown to be polynomially solvable. We also observe that generalized coloring problems can be polynomially solved in triangulated graphs. The main result of Chapter 3 is a new characterization of cographs: it is equivalent to say that G is a cograph, and to state that, for every subgraph G' ⊆ G, G' is (p, k)-colorable if and only if G' [V \ K] is (p – 1, k)-colorable, where K induces a maximum clique of G'. This result implies simple combinatorial algorithme to solve all generalized coloring problems; the one for Min Cocoloring improves the best time complexity known so far. In Chapter 4, we handle the recognition of polar graphs which can be seen as a particular (p, k)-coloring, where p cliques are independent (i.e., not linked at all) and k stable sets form a complete k-partite graph. It is known that the recognition of polar graphs is NP-complete. Here, we determine the first clans of graphs, namely cographs, where the polar graphs can be recognized in polynomial time, more precisely in time O(n log n). We also give a characterization by forbidden subgraphs. In the came manner, we characterize monopolar cographs, i.e., cographs admitting a polar partition with at most one clique or at most one stable set. Chapter 5 is devoted to generalized coloring problems in line graphs. Here, we detect the first classes of graphs, namely line graphs of trees, line graphs of bipartite graphs and line graphs of line-perfect graphs, where generalized coloring problems diverge in terms of NP-hardness. In Chapter 6 we study the approximability of generalized coloring problems in line graphs, in comparability graphs and in general graphs. We derive approximation algorithms with a performance guarantee using both the standard approximation ratio and the differential approximation ratio. We show that both Min Split-coloring and Min Cocoloring are at least as hard as Min Coloring to approximate from the standard approximation ratio point of view, whereas, they admit a polynomial time differential approximation scheme and Min Coloring only a constant differential approximation ratio. We also show that Min Cocoloring reduces to Min Split-coloring in all classes of graphs closed under addition of disjoint cliques and under join of a complete k-partite graph. In Chapter 7, we handle two different applications of Min Split-coloring in permutation graphs. They give birth to a new problem, called Min Threshold-coloring, that we study in the came spirit as the other generalized coloring problems. In the last chapter, we present several open questions arising from this thesis.