**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Locally differentially-private distribution estimation

Abstract

We consider a setup in which confidential i.i.d. samples X1, ..., Xn from an unknown discrete distribution PX are passed through a discrete memoryless privatization channel (a.k.a. mechanism) which guarantees an epsilon-level of local differential privacy. For a given epsilon, the channel should be designed such that an estimate of the source distribution based on the channel outputs converges as fast as possible to the exact value PX. For this purpose we consider two metrics of estimation accuracy: the expected mean-square error and the expected Kullback-Leibler divergence. We derive their respective normalized first-order terms (as n tends to infinity), which for a given target privacy epsilon represent the factor by which the sample size must be augmented so as to achieve the same estimation accuracy as that of an identity (non-privatizing) channel. We formulate the privacy-utility tradeoff problem as being that of minimizing said first-order term under a privacy constraint epsilon. A converse bound is stated which bounds the optimal tradeoff away from the origin. Inspired by recent work on the optimality of staircase mechanisms (albeit for objectives different from ours), we derive an achievable tradeoff based on circulant step mechanisms. Within this finite class, we determine the optimal step pattern.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts (33)

Related publications (44)

Related MOOCs (2)

Sample size determination

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power.

Sample mean and covariance

The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales.

Stratified sampling

In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each subpopulation (stratum) independently. Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should define a partition of the population.

Ontological neighbourhood

:

:

Synchrotrons and X-Ray Free Electron Lasers (part 1)

The first MOOC to provide an extensive introduction to synchrotron and XFEL facilities and associated techniques and applications.

Higher-order asymptotics provide accurate approximations for use in parametric statistical modelling. In this thesis, we investigate using higher-order approximations in two-specific settings, with a particular emphasis on the tangent exponential model. Th ...

Jürg Alexander Schiffmann, Cyril Picard, Faez Ahmed

Exploiting the recent advancements in artificial intelligence, showcased by ChatGPT and DALL-E, in real-world applications necessitates vast, domain-specific, and publicly accessible datasets. Unfortunately, the scarcity of such datasets poses a significan ...

, , ,

We present Diffusion in Style, a simple method to adapt Stable Diffusion to any desired style, using only a small set of target images. It is based on the key observation that the style of the images generated by Stable Diffusion is tied to the initial lat ...

2023