Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
With the increasing prevalence of massive datasets, it becomes important to design algorithmic techniques for dealing with scenarios where the input to be processed does not fit in the memory of a single machine. Many highly successful approaches have emerged in recent decades, such as processing the data in a stream, parallel processing, and data compression. The aim of this thesis is to apply these techniques to various important graph theoretical problems. Our contributions can be broadly classified into two categories: spectral graph theory, and maximum matching.Spectral Graph Theory. Spectral sparsification is a technique of rendering an arbitrary graph sparse, while approximately preserving the quadratic form of the Laplacian matrix. In this thesis, we extend the result of (Kapralov et al.), and propose a sketch and corresponding decoding algorithm for constructing a spectral sparsifier from a dynamic stream of edge insertions and deletions. The size of the resulting sparsifier, the size of the sketch, and the decoding time are all nearly linear in the number of vertices, and consequently nearly optimal.The concept of spectral sparsification has recently been generalized to hypergraphs (Soma and Yoshida) -- an analogue of graphs for modeling higher order relationships. As one of the main contributions of the thesis, we prove for the first time the existence of nearly-linear sized spectral sparsifiers for arbitrary hypergraphs, and provide a corresponding nearly-linear time algorithm for constructing them. Through a lower bound construction, we show that our sparsifiers achieve nearly-optimal compression of the hypergraph spectral structure.On the more applied side of spectral graph theory, we present a fully scalable MPC (massively parallel computation) algorithm which is capable of simulating a large number of independent random walks of length L from an arbitrary starting distribution in O(log(L)) rounds.Maximum Matching. We propose a novel randomized composable coreset for the problem of maximum matching, called the matching skeleton. The coreset achieves a 1/2 approximation, while having fewer than n edges.We also propose a new, highly space-efficient variant of a peeling algorithm for maximum matching. With this, we are able to approximate the maximum matching size of a graph to within a constant factor, using a stream of m uniformly random edges (where m is the total number of edges), in as little as O(log^2(n)) space. Conversely, we show that significantly fewer (that is m^(1-Omega(1))) samples do not suffice, even with unlimited space. Finally, we design a Local Computation Algorithm, which implicitly construct a constant-approximate maximum matching in time and space that is nearly linear in the maximum degree.
, ,