**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# System and method for privacy-preserving distributed training of neural network models on distributed datasets

Résumé

A computer-implemented method and a distributed computer system (100) for privacy- preserving distributed training of a global neural network model on distributed datasets (DS1 to DSn). The system has a plurality of data providers (DP1 to DPn) being communicatively coupled. Each data provider has a respective local training dataset (DS1 to DSn) and a vector of output labels (OL1 to OLn) for training the global model. Further, it has a portion of a cryptographic distributed secret key (SK1 to SKn) and a corresponding collective cryptographic public key (CPK) of a multiparty fully homomorphic encryption scheme, with the weights of the global model being encrypted with the collective public key. Each data provider (DP1) computes and aggregates, for each layer of the global model, encrypted local gradients (LG1) using the respective local training dataset (DS1) and output labels (OL1), with forward pass and backpropagation using stochastic gradient descent. At least one data provider homomorphically combines at least a subset of the current local gradients of at least a subset of the data providers into combined local gradients, and updates the weights of the current global model (GM) based on the combined local gradients.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (10)

Système

Un système est un ensemble d' interagissant entre eux selon certains principes ou règles. Par exemple une molécule, le système solaire, une ruche, une société humaine, un parti, une armée etc.
Un s

Ordinateur

Un ordinateur est un système de traitement de l'information programmable tel que défini par Alan Turing et qui fonctionne par la lecture séquentielle d'un ensemble d'instructions, organisées en progr

Clé de chiffrement

Une clé est un paramètre utilisé en entrée d'une opération cryptographique (chiffrement, déchiffrement, scellement, signature numérique, vérification de signature).
Une clé de chiffrement peut être s

Publications associées (3)

Chargement

Chargement

Chargement

Synopsis: Implement a new way of interacting with your computer via voice control instead of the mouse and keyboard. Level:BS, MS Description: Google Home and Amazon Alexa are quickly revolutionizing how we interact with smart devices. Both use “wake words” (“OK Google” and “Alexa” respectively) to detect the user’s intention to interact. While the wake word detection is typically done on the device to insure minimum latency, the user’s commands following it are usually processed remotely. The goal of this project is to program a microcontroller to process acoustic data locally and in real time. The microcontroller should run a speech recognition model to extract specific commands from the spoken words of the user. The chip should then emulate a USB device such as a mouse or keyboard buttons and send the derived commands to trigger actions on the host computer. An important aspect of the project will be to understand the limits of what can be processed on the microcontroller, in terms of memory and computation time. The student has the option to either work on implementing machine learning models such as CNNs on the microchip, or to work on emulating the USB peripheral. Ideally, we will have two students working on both components of the project such that we have a full working system in the end of the semester. Deliverables: A report and a working system with clear documentation. References: for useful links, see list of URLs below. Prerequisites: First part: Knowledge of or strong interest for machine learning, in particular neural networks. Basics in programming of embedded systems. Second part: Basics of C programming, embedded systems, preferably knowledge of USB devices. Type of Work: 50% algorithm design/analysis, 50% programming

2018Jean-Philippe Léonard Bossuat, David Jules Froelicher, Joao André Gomes de Sá E Sousa, Jean-Pierre Hubaux, Apostolos Pyrgelis, Sinem Sav, Juan Ramón Troncoso-Pastoriza

A computer-implemented method and a distributed computer system (100) for privacy- preserving distributed training of a global model on distributed datasets (DS1 to DSn). The system has a plurality of data providers (DP1 to DPn) being communicatively coupled. Each data provider has a respective local model (LM1 to LMn) and a respective local training dataset (DS1 to DSn) for training the local model using an iterative training algorithm (IA). Further it has a portion of a cryptographic distributed secret key (SK1 to SKn) and a corresponding collective cryptographic public key (CPK) of a multiparty fully homomorphic encryption scheme, with the local and global model being encrypted with the collective public key. Each data provider (DP1) trains its local model (LM1) using the respective local training dataset (DS1) by executing gradient descent updates of its local model (LM1), and combining (1340) the updated local model (LM1') with the current global model (GM) into a current local model (LM1c). At least one data provider homomorphically combines at least a subset of the current local models of at least a subset of the data providers into a combined model (CM1), and updates the current global model (GM) based on the combined model. The updated global model is provided to at least a subset of the other data providers.

2021State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech data during training. Therefore, building multilingual ASR systems or systems on a language with few resources is a challenging task. Multilingual training and cross-lingual adaptation are potential solutions. However, context-dependent states modeling creates difficulties for multilingual and cross-lingual ASR because of the large increase in context dependent labels arising from the phone set mismatch.
The goal of this thesis is to improve current state-of-the-art acoustic modeling techniques in general for ASR, with a particular focus on multilingual ASR and cross-lingual adaptation. We systematically exploited new training frameworks, from Maximum Likelihood Estimation, Connectionist Temporal Classification to Maximum Mutual Information, in the context of phoneme-based multilingual training. In order to minimize the negative effects of data impurity arising from language mismatch, we investigated language adaptive training approaches which help further improve the multilingual ASR performance. Through comprehensive experimental comparison we demonstrated that phoneme-based multilingual models are easily extensible to unseen phonemes of new languages, from which the cross-lingual adaptation yields significant improvement over traditional approaches on limited data. Finally, we proposed a semi-supervised training approach based on dropout to boost the performance in low-resourced languages using untranscribed data.
In the other part of the thesis, we conducted more theoretical analysis of techniques found to be useful in sequential multilingual training. More specifically, we revisited the recurrent architecture based on Bayesâs theorem. This leads to a Bayesian recurrent unit dictated by the probabilistic formulation and naturally support a backward recursion. Experiments show that the proposed architecture exceeds the performance of conventional recurrent network.
Together, this thesis constitutes a thorough analysis of the current field. Through theoretical and experimental comparisons, the proposed approaches are shown to yield significant improvement over the conventional hybrid systems on multilingual speech recognition.