**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Locally Differentially-Private Randomized Response for Discrete Distribution Learning

Abstract

We consider a setup in which confidential i.i.d. samples X1, . . . , Xn from an unknown finite-support distribution p are passed through n copies of a discrete privatization chan- nel (a.k.a. mechanism) producing outputs Y1, . . . , Yn. The channel law guarantees a local differential privacy of ε. Subject to a prescribed privacy level ε, the optimal channel should be designed such that an estimate of the source distribution based on the channel out- puts Y1, . . . , Yn converges as fast as possible to the exact value p. For this purpose we study the convergence to zero of three distribution distance metrics: f-divergence, mean- squared error and total variation. We derive the respective normalized first-order terms of convergence (as n → ∞), which for a given target privacy ε represent a rule-of-thumb factor by which the sample size must be augmented so as to achieve the same estimation accuracy as that of a non-randomizing channel. We formulate the privacy–fidelity trade-off problem as being that of minimizing said first-order term under a privacy constraint ε. We further identify a scalar quantity that captures the essence of this trade-off, and prove bounds and data-processing inequalities on this quantity. For some specific instances of the privacy–fidelity trade-off problem, we derive inner and outer bounds on the optimal trade-off curve.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related MOOCs (6)

Related publications (54)

Related concepts (34)

Synchrotrons and X-Ray Free Electron Lasers (part 1)

Synchrotrons and X-Ray Free Electron Lasers (part 1)

Synchrotrons and X-Ray Free Electron Lasers (part 2)

The first MOOC to provide an extensive introduction to synchrotron and XFEL facilities and associated techniques and applications.

Digital Signal Processing I

Basic signal processing concepts, Fourier analysis and filters. This module can
be used as a starting point or a basic refresher in elementary DSP

Sample size determination

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power.

Sample mean and covariance

The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales.

Stratified sampling

In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each subpopulation (stratum) independently. Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should define a partition of the population.

Jürg Alexander Schiffmann, Cyril Picard, Faez Ahmed

Exploiting the recent advancements in artificial intelligence, showcased by ChatGPT and DALL-E, in real-world applications necessitates vast, domain-specific, and publicly accessible datasets. Unfortunately, the scarcity of such datasets poses a significan ...

Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of meta-learners for estimation of heterogeneous treatment effects under the usage of sam ...

2022, ,

We consider the problem of parameter estimation in a Bayesian setting and propose a general lower-bound that includes part of the family of f-Divergences. The results are then applied to specific settings of interest and compared to other notable results i ...

2022