Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Various forms of real-world data, such as social, financial, and biological networks, can berepresented using graphs. An efficient method of analysing this type of data is to extractsubgraph patterns, such as cliques, cycles, and motifs, from graphs. For instance, findingcycles in financial graphs enables the detection of financial crimes such as money launderingand circular stock trading. In addition, extracting cliques from social network graphs enablesthe detection of communities and could help predict the spread of epidemics. However,extracting such patterns can be time-consuming, especially in larger graphs, because thenumber of patterns can grow exponentially with the graph size. Therefore, fast and scalableparallel algorithms are required to make the enumeration of these subgraph patterns tractablefor real-world graphs.This thesis presents fast parallel algorithms for the enumeration of maximal cliques and simplecycles. We focus on accelerating the asymptotically-optimal sequential algorithms forenumerating the aforementioned patterns by parallelising them on manycore CPUs. To enablescalable parallelisation of clique and cycle enumeration algorithms, the algorithms presentedin this thesis rely on fine-grained parallelisation, in which recursive calls are executed independentlyof each other using several software threads. However, simply applying the fine-grainedparallelisation method to the aforementioned asymptotically-optimal algorithms leads tosuboptimal solutions. Parallelising maximal clique enumeration using this method results inincreased overhead caused by multithreaded memory management and task scheduling, aswell as increased dynamic memory usage. In addition, the asymptotically-optimal algorithmsfor simple cycle enumeration rely on strict depth-first traversal of their recursion tree, makingthe fine-grained parallelisation of these algorithms challenging. This thesis addresses theseproblems and presents parallel algorithms that lead to an almost linear scaling of performancewith the number of CPU cores utilised. As a result, the parallel algorithms presented in thisthesis are able to achieve an order of magnitude speedup compared to the state-of-the-artparallel algorithms when executed on manycore CPU systems.To demonstrate the applicability of our accelerated algorithms, this thesis presents the GraphFeature Preprocessor library, which can be used to detect financial crime. This library expandsthe feature set of financial transactions by enumerating well-known money laundering andfinancial fraud subgraph patterns, such as simple cycles, in financial transaction graphs. Whenused in combination with gradient-boosting-based machine learning models, the expandedfeature set produced by the library enables significant improvements in prediction accuracyfor the problems of money laundering and phishing detection. Furthermore, the efficiency ofthe subgraph pattern mining algorithms presented in this thesis enables this library to operatein real time.As financial fraud schemes become more complex, fast algorithms that can detect suspiciousbehaviour are required. This thesis demonstrates that the parallel graph pattern mining algorithmsintroduced here can be used to enable fast and accurate detection of such suspiciousbehaviour.
Volkan Cevher, Grigorios Chrysos, Efstratios Panteleimon Skoulakis