Publication

Efficient large-scale graph processing: optimisations for storage, performance and evolving graphs

Jasmina Malicevic
2019
EPFL thesis
Abstract

Graph processing systems are used in a wide variety of fields, ranging from biology to social networks. Algorithms to mine graphs incur many random accesses, and the sparse nature of the graphs of interest, exacerbates this. As DRAM sustains high bandwidth in the presence of random accesses, traditionally, it was the chosen medium to store and process graphs from.

Many in-memory systems were presented at top tier conferences, and every system compared to the previous in algorithm execution time. What is not clearly evaluated are the overheads of pre-processing the input in order to support particular optimisations. More critical, for a systems designer, it is very hard to reason about what are the fundamental techniques that lead to performance gains.

We implement the proposed optimisations of state-of-the-art systems within one system to answer this question. We found that the pre-processing cost often dominates the cost of algorithm execution, calling into question the benefits of proposed algorithmic optimisations that rely on extensive pre-processing. The design of a system has to be carefully guided by the characteristics of the algorithms, graphs and hardware.

Furthermore, in-memory systems are often limited by the amount of available memory on a single machine. When the graph cannot fit in the available DRAM, many systems scale up to secondary storage on one machine, or in the cloud. As storage is cheaper than memory, they trade off performance for more space, at a lower cost. Emerging non-volatile technologies such as 3D XPoint, offer a new opportunity for bridging this performance gap. Designing a system that fully utilises these storage devices is not straightforward. Traditional I/O optimisations, designed for SSDs, underutilise the device, and systems end up being CPU bound on fast storage. By removing the dedicated I/O layers, and combining pre-processing approaches for in-memory and out-of-core systems, we are able to switch graph representations to benefit a particular algorithm. This improves the end-to-end execution time up to 2x.

However, in the presence of updates to the graph structure, the data structures used by static algorithms need to be re-created on every update, and the algorithm executed from scratch. The high latency of this process is not acceptable for critical applications such as fraud detection.

We address this problem by developing two systems to compute on evolving graphs, using an update friendly graph representation. Each system targets a different group of applications: graph analytics and graph pattern mining (GPM). For graph analytics we trade sub-millisecond latencies for consistency. For GPM, we use re-computation and backtracking to handle millions of updates per second, outperforming state of the art by up to 5x.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (33)
Graph theory
In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices (also called nodes or points) which are connected by edges (also called links or lines). A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.
Graph database
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation.
Graph rewriting
In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering (software construction and also software verification) to layout algorithms and picture generation. Graph transformations can be used as a computation abstraction. The basic idea is that if the state of a computation can be represented as a graph, further steps in that computation can then be represented as transformation rules on that graph.
Show more
Related publications (150)

The Power of Two Matrices in Spectral Algorithms for Community Recovery

Colin Peter Sandon

Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms ...
Ieee-Inst Electrical Electronics Engineers Inc2024

The connection of the acyclic disconnection and feedback arc sets - On an open problem of Figueroa et al.

Lukas Fritz Felix Vogl

We examine the connection of two graph parameters, the size of a minimum feedback arcs set and the acyclic disconnection. A feedback arc set of a directed graph is a subset of arcs such that after deletion the graph becomes acyclic. The acyclic disconnecti ...
Elsevier2024

Network-based kinetic models: Emergence of a statistical description of the graph topology

Matteo Raviola

In this paper, we propose a novel approach that employs kinetic equations to describe the collective dynamics emerging from graph-mediated pairwise interactions in multi-agent systems. We formally show that for large graphs and specific classes of interact ...
Cambridge2024
Show more
Related MOOCs (20)
Analyse I
Le contenu de ce cours correspond à celui du cours d'Analyse I, comme il est enseigné pour les étudiantes et les étudiants de l'EPFL pendant leur premier semestre. Chaque chapitre du cours correspond
Analyse I (partie 1) : Prélude, notions de base, les nombres réels
Concepts de base de l'analyse réelle et introduction aux nombres réels.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.