Publication

Privacy-preserving and Personalized Federated Machine Learning for Medical Data

Felix Hans Michel Grimberg
2020
Student project
Abstract

The federated learning setting is prone to suffering from non-identically distributed data across participating agents. This gives rise to the task of model personalization, where agents collaborate to train several different machine learning models instead of training only one global model. The aim of model personalization is to minimize the sum of the generalization error incurred from training on small data sets, and the transfer error incurred from applying a globally-trained model to a specific local distribution. In this report, two novel approaches to personalized cross-silo federated learning are introduced and discussed from a theoretical perspective: the adapted Ndoye factor, and the Weight Erosion aggregation scheme. The latter is implemented and compared to two baseline aggregation schemes in two case studies: training a diagnostic model on real-world medical data, and predicting the survival of passengers on the publicly available Titanic data set. The models trained using the Weight Erosion aggregation scheme are compared to those trained using the baseline aggregation schemes, both in terms of their classification accuracy on the local test set and in terms of the learned parameters. We demonstrate that the novel Weight Erosion scheme can outperform both baseline aggregation schemes for some specific tasks.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (32)
Big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe big data is the one associated with a large body of information that we could not comprehend when used only in smaller amounts.
Machine learning
Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.
Data Preprocessing
Data preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues. Analyzing data that has not been carefully screened for such problems can produce misleading results.
Show more
Related publications (32)

Robust machine learning for neuroscientific inference

Steffen Schneider

Modern neuroscience research is generating increasingly large datasets, from recording thousands of neurons over long timescales to behavioral recordings of animals spanning weeks, months, or even years. Despite a great variety in recording setups and expe ...
EPFL2024

Generalization and Personalization of Machine Learning for Multimodal Mobile Sensing in Everyday Life

Lakmal Buddika Meegahapola

A range of behavioral and contextual factors, including eating and drinking behavior, mood, social context, and other daily activities, can significantly impact an individual's quality of life and overall well-being. Therefore, inferring everyday life aspe ...
EPFL2024

Data and scripts for the RaFSIP scheme

Athanasios Nenes, Paraskevi Georgakaki

This repository contains microphysics routines, scripts, and processed data from the Weather Research and Forecasting (WRF) model simulations presented in the paper "RaFSIP: Parameterizing ice multiplication in models using a machine learning approach", by ...
Zenodo2024
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.