Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
A motif is a frequently occurring subgraph of a given directed or undirected graph G (Milo et al.). Motifs capture higher order organizational structure of G beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few (Benson at al., Tsourakakis at al.). In these applications, the cut structure of motifs plays a crucial role as vertices are partitioned into clusters by cuts whose conductance is based on the number of instances of a particular motif, as opposed to just the number of edges, crossing the cuts. In this paper, we introduce the concept of a motif cut sparsifier. We show that one can compute in polynomial time a sparse weighted subgraph G' with only (O) over tilde( n/is an element of(2)) edges such that for every cut, the weighted number of copies of M crossing the cut in G' is within a 1 + is an element of factor of the number of copies of M crossing the cut in G, for every constant size motif M. Our work carefully combines the viewpoints of both graph sparsification and hypergraph sparsification. We sample edges which requires us to extend and strengthen the concept of cut sparsifiers introduced in the seminal works of Karger and Benczur et al. to the motif setting. The task of adapting the importance sampling framework common to efficient graph sparsification algorithms to the motif setting turns out to be nontrivial due to the fact that cut sizes in a random subgraph of G depend non-linearly on the sampled edges. To overcome this, we adopt the viewpoint of hypergraph sparsification to define edge sampling probabilities which are derived from the strong connectivity values of a hypergraph whose hyperedges represent motif instances. Finally, an iterative sparsification primitive inspired by both viewpoints is used to reduce the number of edges in G to nearly linear. In addition, we present a strong lower bound ruling out a similar result for sparsification with respect to induced occurrences of motifs.