Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data silos. Sharing, transferring, and centralizing the data from silos, however, is difficult due to current privacy regulations (e.g., HIPAA or GDPR) and due to business competition (e.g., in the finance field). An existing solution for collaborative machine learning is federated learning where several parties collectively train a machine learning model without sharing or transferring their local data. With federated learning, the local data is preserved on the parties premises, the global model is trained via an iterative exchange of cleartext gradients that are computed locally. These gradients have been shown to leak private information about the original training data of the parties through inference attacks. Consequently, any solution that does not incorporate additional security and privacy mechanism to protect these gradients put the training data and their subjects at risk.In this thesis, we propose, implement, and optimize several neural network algorithms to preserve the privacy of the model and the data during federated learning. Our solutions mitigate federated learning attacks that target the gradients during training in the cross- silo and horizontal federated learning settings with N parties. We also protect the querier's evaluation data sent to prediction-as-a-service (PaaS) systems. To achieve this, we rely on lattice-based multiparty homomorphic encryption (MHE), where all communicated values between the parties remain encrypted and all computations are carried out under encryption. With this, our solutions ensure both the data and the model confidentiality during the training and the prediction under a passive adversary threat model that allows for collusions between up to N -1 parties. We (i) propose and implement privacy-preserving federated neural network operations for different neural network architectures (e.g., multilayer perceptrons or recurrent neural networks), (ii) evaluate their performance under cross-silo federated learning settings in terms of model performance, scalability, and efficiency, and (iii) show the maturity of our solutions for real-life use-cases (e.g., medical application). We experimentally show that our solutions' performance is similar to centralized or decentralized non-private approaches and that the communication overhead scales linearly with the number of parties.
Yi Zhang, Wenlong Liao, Zhe Yang
,