Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Graph Neural Networks (GNNs) have emerged as a powerful tool for learning on graphs, demonstrating exceptional performance in various domains. However, as GNNs become increasingly popular, new challenges arise. One of the most pressing is the need to ensure privacy when learning from graphs since they often contain personal information like social interactions and financial transactions. Balancing the preservation of privacy with the effectiveness of GNNs is a crucial yet challenging task due to the unique relational nature of graphs, which sets them apart from other data types like images and text. This thesis presents an in-depth exploration of privacy-preserving GNNs, explicitly focusing on Differential Privacy (DP) as a rigorous mathematical framework for formalizing the privacy guarantees of machine learning algorithms.First, we focus on local DP and propose a privacy-preserving Graph Convolutional Network (GCN) model, allowing a central server to efficiently communicate with the graph nodes to privately collect their features and estimate the first-layer graph convolution of the GNN. Building on the same foundations, we then propose an architecture-agnostic approach for settings where all the node-specific data, including features and labels, are private and locally stored on the nodes, but the edges remain public to the central server that trains the GNN. Our contributions include a locally private randomizer for perturbing and compressing node features, a denoising mechanism based on graph convolution to alleviate the noise effect, and a robust learning algorithm to deal with noisy labels. Next, we shift our focus to global DP, where the entire graph data is considered private, and the goal is to train and subsequently release a GNN model with DP guarantees. Our main contribution, which enables private learning and inference on graph data, is the aggregation perturbation (AP) technique, which adds noise to GNNs' neighborhood aggregation step to achieve DP. Built upon this technique, we first propose a tailored GNN architecture that decouples the aggregation steps from the model parameters to reduce the privacy cost of AP, leading to a significant decrease in the privacy costs of the model. Then, we propose a progressive GNN model to further improve the accuracy-privacy trade-off of the AP technique without sacrificing privacy. The new method iteratively builds GNNs using private node embeddings from prior steps, maintaining their representational power and managing privacy costs.Empirical evaluations demonstrate that the proposed approaches provide strong privacy guarantees while effectively maintaining the utility of the learned models. Our methods can be applied in a wide range of applications, from social network analysis to disease prediction, thereby having a broad impact on both academia and industry. By addressing the challenge of privacy in learning on graphs, this thesis contributes to the broader efforts in promoting responsible and ethical use of machine learning and provides a solid foundation for future research on trustworthy GNNs, paving the way for their adoption in various domains where privacy is paramount.
Rachid Guerraoui, Martin Jaggi, Anastasiia Koloskova, Youssef Allouah, Aymane El Firdoussi
Daniel Gatica-Perez, Sina Sajadmanesh