**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Space-Efficient Representations of Graphs

Abstract

With the increasing prevalence of massive datasets, it becomes important to design algorithmic techniques for dealing with scenarios where the input to be processed does not fit in the memory of a single machine. Many highly successful approaches have emerged in recent decades, such as processing the data in a stream, parallel processing, and data compression. The aim of this thesis is to apply these techniques to various important graph theoretical problems. Our contributions can be broadly classified into two categories: spectral graph theory, and maximum matching.Spectral Graph Theory. Spectral sparsification is a technique of rendering an arbitrary graph sparse, while approximately preserving the quadratic form of the Laplacian matrix. In this thesis, we extend the result of (Kapralov et al.), and propose a sketch and corresponding decoding algorithm for constructing a spectral sparsifier from a dynamic stream of edge insertions and deletions. The size of the resulting sparsifier, the size of the sketch, and the decoding time are all nearly linear in the number of vertices, and consequently nearly optimal.The concept of spectral sparsification has recently been generalized to hypergraphs (Soma and Yoshida) -- an analogue of graphs for modeling higher order relationships. As one of the main contributions of the thesis, we prove for the first time the existence of nearly-linear sized spectral sparsifiers for arbitrary hypergraphs, and provide a corresponding nearly-linear time algorithm for constructing them. Through a lower bound construction, we show that our sparsifiers achieve nearly-optimal compression of the hypergraph spectral structure.On the more applied side of spectral graph theory, we present a fully scalable MPC (massively parallel computation) algorithm which is capable of simulating a large number of independent random walks of length L from an arbitrary starting distribution in O(log(L)) rounds.Maximum Matching. We propose a novel randomized composable coreset for the problem of maximum matching, called the matching skeleton. The coreset achieves a 1/2 approximation, while having fewer than n edges.We also propose a new, highly space-efficient variant of a peeling algorithm for maximum matching. With this, we are able to approximate the maximum matching size of a graph to within a constant factor, using a stream of m uniformly random edges (where m is the total number of edges), in as little as O(log^2(n)) space. Conversely, we show that significantly fewer (that is m^(1-Omega(1))) samples do not suffice, even with unlimited space. Finally, we design a Local Computation Algorithm, which implicitly construct a constant-approximate maximum matching in time and space that is nearly linear in the maximum degree.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (22)

Loading

Loading

Loading

Related concepts (15)

Hypergraph

In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.
Formally, a

Laplacian matrix

In the mathematical field of graph theory, the Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian, is a matrix representation of a graph. Nam

Algorithm

In mathematics and computer science, an algorithm (ˈælɡərɪðəm) is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algo

Many of the currently best-known approximation algorithms for NP-hard optimization problems are based on Linear Programming (LP) and Semi-definite Programming (SDP) relaxations. Given its power, this class of algorithms seems to contain the most favourable candidates for outperforming the current state-of-the-art approximation guarantees for NP-hard problems, for which there still exists a gap between the inapproximability results and the approximation guarantees that we know how to achieve in polynomial time. In this thesis, we address both the power and the limitations of these relaxations, as well as the connection between the shortcomings of these relaxations and the inapproximability of the underlying problem. In the first part, we study the limitations of LP relaxations of well-known graph problems such as the Vertex Cover problem and the Independent Set problem. We prove that any small LP relaxation for the aforementioned problems, cannot have an integrality gap strictly better than $2$ and $\omega(1)$, respectively. Furthermore, our lower bound for the Independent Set problem also holds for any SDP relaxation. Prior to our work, it was only known that such LP relaxations cannot have an integrality gap better than $1.5$ for the Vertex Cover Problem, and better than $2$ for the Independent Set problem. In the second part, we study the so-called knapsack cover inequalities that are used in the current best relaxations for numerous combinatorial optimization problems of covering type. In spite of their widespread use, these inequalities yield LP relaxations of exponential size, over which it is not known how to optimize exactly in polynomial time. We address this issue and obtain LP relaxations of quasi-polynomial size that are at least as strong as that given by the knapsack cover inequalities. In the last part, we show a close connection between structural hardness for k-partite graphs and tight inapproximability results for scheduling problems with precedence constraints. This connection is inspired by a family of integrality gap instances of a certain LP relaxation. Assuming the hardness of an optimization problem on k-partite graphs, we obtain a hardness of $2-\varepsilon$ for the problem of minimizing the makespan for scheduling with preemption on identical parallel machines, and a super constant inapproximability for the problem of scheduling on related parallel machines. Prior to this result, it was only known that the first problem does not admit a PTAS, and the second problem is NP-hard to approximate within a factor strictly better than 2, assuming the Unique Games Conjecture.

Optimization is a fundamental tool in modern science. Numerous important tasks in biology, economy, physics and computer science can be cast as optimization problems. Consider the example of machine learning: recent advances have shown that even the most sophisticated tasks involving decision making, can be reduced to solving certain optimization problems. These advances however, bring several new challenges to the field of algorithm design. The first of them is related to the ever-growing size of instances, these optimization problems need to be solved for. In practice, this forces the algorithms for these problems to run in time linear or nearly linear in their input size. The second challenge is related to the emergence of new, harder and harder problems which need to be dealt with. These problems are in most cases considered computationally intractable because of complexity barriers such as NP completeness, or because of non-convexity. Therefore, efficiently computable relaxations for these problems are typically desired.
The material of this thesis is divided into two parts. In the first part we attempt to address the first challenge. The recent tremendous progress in developing fast algorithm for such fundamental problems as maximum flow or linear programming, demonstrate the power of continuous techniques and tools such as electrical flows, fast Laplacian solvers and interior point methods. In this thesis we study new algorithms of this type based on continuous dynamical systems inspired by the study of a slime mold Physarum polycephalum. We perform a rigorous mathematical analysis of these dynamical systems and extract from them new, fast algorithms for problems such as minimum cost flow, linear programming and basis pursuit.
In the second part of the thesis we develop new tools to approach the second challenge. Towards this, we study a very general form of discrete optimization problems and its extension to sampling and counting, capturing a host of important problems such as counting matchings in graphs, computing permanents of matrices or sampling from constrained determinantal point processes. We present a very general framework, based on polynomials, for dealing with these problems computationally. It is based, roughly, on encoding the problem structure in a multivariate polynomial and then recovering the solution by means of certain continuous relaxations. This leads to several questions on how to reason about such relaxations and how to compute them. We resolve them by relating certain analytic properties of the arising polynomials, such as the location of their roots or convexity, to the combinatorial structure of the underlying problem.
We believe that the ideas and mathematical techniques developed in this thesis are only a beginning and they will inspire more work on the use of dynamical systems and polynomials in the design of fast algorithms.

We present an O(m^10/7) = O(m^1.43)-time algorithm for the maximum s-t flow and the minimum s-t cut problems in directed graphs with unit capacities. This is the first improvement over the sparse-graph case of the long-standing O(m min{m^1/2, n^2/3}) running time bound due to Even and Tarjan [16]. By well-known reductions, this also establishes an O(m^10/7)-time algorithm for the maximum-cardinality bipartite matching problem. That, in turn, gives an improvement over the celebrated O(mn^1/2) running time bound of Hopcroft and Karp [25] whenever the input graph is sufficiently sparse. At a very high level, our results stem from acquiring a deeper understanding of interior-point methods - a powerful tool in convex optimization - in the context of flow problems, as well as, utilizing certain interplay between maximum flows and bipartite matchings.