**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Extensions of Peer Prediction Incentive Mechanisms

Abstract

As large, data-driven artificial intelligence models become ubiquitous, guaranteeing high data quality is imperative for constructing models. Crowdsourcing, community sensing, and data filtering have long been the standard approaches to guaranteeing or improving data quality. The underlying theory, mainly incentive mechanism design, is often limited in its scope of applicability. A subset of incentive mechanisms designed to handle unverifiable or inherently subjective data - Peer Prediction mechanisms - is generally only applicable to settings where the data signal comes from a discrete distribution. In this thesis, we expand the scope of applicability of Peer Prediction mechanisms in two parts.In the first part, we address a constrained extension of Peer Prediction that is applicable to machine learning. A data collecting entity, known as a Center, may not need to learn a joint distribution of (x,y) pairs. It may only need to learn a parameterized model that minimizes a loss function on the joint distribution. We analyze a statistical measure known as Influence, which can be interpreted as a form of Peer Prediction. We will show that the Peer Truth Serum (PTS) is a special case of Influence, and that Influence has desirable game-theoretic properties as an incentive mechanism.We then take the analysis of Influence into the regime of data filtering, which is uniquely challenging compared to crowdsourcing. We use asymptotic analysis to show that, in the limit of infinite samples, the ability to filter training data using Influence is constrained by the degree of corruption in the validation data. However, finite sample analysis reveals that one can exceed the quality of the validation data if conditions are met regarding higher moments of the data models.In the second part, we move on from this more constrained extension to the most general extension of Peer Prediction: learning arbitrary distributions. Many crowdsourcing problems involve absolutely continuous distributions, such as Gaussian distributions. The standard approach is to discretize the space and apply a discrete Peer Prediction mechanism. This approach has numerous issues: coarse discretizations result in inaccurate approximations of the distribution and loose incentives, while fine discretizations result in volatile payments, which tend to fail in real world applications. We expand the theory of Peer Prediction, rather than seek a better implementation of current theory. We consider two approaches.In the first approach, one can discretize the space, which we call partitioning into bins, but pick from a set of partitions rather than just one. In this regime, the notion of peer matching in Peer Prediction is generalized with the concept of Peer Neighborhoods. With a reasonable strengthening of the Agent update condition, we obtain a valid extension of the PTS on arbitrary distributions.The partitioning approach for arbitrary distributions reveals a more precise theory. By changing perspective from partitioning according to the Lebesgue measure on the space of reports to partitioning according to the public probability measure, we obtain a payment function that doesn't rely on discretization. Using this function as the basis for a mechanism, a Continuous Truth Serum, reveals solutions to other underlying problems with Peer Prediction, such as the unobserved category problem.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts (41)

Related MOOCs (32)

Related publications (99)

Normal distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution (and also its median and mode), while the parameter is its standard deviation. The variance of the distribution is . A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

Quality assurance

Quality assurance (QA) is the term used in both manufacturing and service industries to describe the systematic efforts taken to assure that the product(s) delivered to customer(s) meet with the contractual and other agreed upon performance, design, reliability, and maintainability expectations of that customer. The core purpose of Quality Assurance is to prevent mistakes and defects in the development and production of both manufactured products, such as automobiles and shoes, and delivered services, such as automotive repair and athletic shoe design.

Ratio distribution

A ratio distribution (also known as a quotient distribution) is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two (usually independent) random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution. An example is the Cauchy distribution (also called the normal ratio distribution), which comes about as the ratio of two normally distributed variables with zero mean.

Basic signal processing concepts, Fourier analysis and filters. This module can
be used as a starting point or a basic refresher in elementary DSP

Adaptive signal processing, A/D and D/A. This module provides the basic
tools for adaptive filtering and a solid mathematical framework for sampling and
quantization

Advanced topics: this module covers real-time audio processing (with
examples on a hardware board), image processing and communication system design.

Boi Faltings, Ljubomir Rokvic, Panayiotis Danassis

Federated Learning by nature is susceptible to low-quality, corrupted, or even malicious data that can severely degrade the quality of the learned model. Traditional techniques for data valuation cannot be applied as the data is never revealed. We present ...

2023Kathryn Hess Bellwald, Lida Kanari, Adélie Eliane Garin

In this paper we consider two aspects of the inverse problem of how to construct merge trees realizing a given barcode. Much of our investigation exploits a recently discovered connection between the symmetric group and barcodes in general position, based ...

2024A crucial building block of responsible artificial intelligence is responsible data governance, including data collection. Its importance is also underlined in the latest EU regulations. The data should be of high quality, foremost correct and representati ...

2023