One major challenge in distributed learning is to efficiently learn for each client when the data across clients is heterogeneous or non iid (not independent or identically distributed). This provides a significant challenge as the data of the other clients may not be helpful to each individual client. Thus the following question arises - can each individual client’s performance be improved with access to the data of other clients in this heterogeneous data setting? A further challenge is to have a good personalized model while still maintaining the privacy of local data samples.We consider a model where the client data distributions are not identical and can be dependent. In this heterogeneous data setting we study the problem of distributed learning of data distributions. We propose a personalized linear estimator for each client and show that this estimator is never worse and can be substantially better (up to a factor equal to the number of clients) than the sample mean estimator while still concentrating around the true probability. This estimator can be implemented by privacy-preserving schemes in both the cryptographic and differentially private settings.
Carmela González Troncoso, Bogdan Kulynych