An important prerequisite for developing trustworthy artificial intelligence is high quality data. Crowdsourcing has emerged as a popular method of data collection in the past few years. However, there is always a concern about the quality of the data thus collected. This thesis addresses two major challenges in collecting high quality data from a crowd: 1) how to incentivize crowd workers to report accurate data; 2) how to ensure that the data collection mechanism is transparent and fair.
We first propose two novel peer-consistency mechanisms for crowdsourcing: the Deep Bayesian Trust (DBT) mechanism and the Personalized Peer Truth Serum (PPTS). The DBT mechanism incentivizes workers to report accurate answers for objective questions with discrete ground truth answers. It is useful, for example, in collecting labels for supervised machine learning tasks. The mechanism ensures dominant uniform strategy incentive compatibility and fair rewards to the workers. The PPTS incentivizes workers to truthfully report their personal data (for example, body measurements). Since data is personal in nature, the tasks can not be shared between two workers. We show that when individuals report combinations of multiple personal data attributes, the correlation between them can be exploited to find peers and provide guarantees on the incentive compatibility of the mechanism.
We next address the transparency issue of data collection. Smart contracts often rely on a trusted third party (oracle) to get correct information about real-world events. We show how peer-consistency mechanisms can be used to build decentralized, trustless and transparent data oracles on blockchain. We derive conditions under which a peer-consistency incentive mechanism can be used to acquire truthful information from an untrusted and self-interested crowd, even when the crowd has outside incentives to provide wrong information. We also show how to implement the peer-consistency mechanisms in Ethereum. We discuss various non-trivial issues that arise in implementing peer-consistency mechanisms in Ethereum, suggest several optimizations to reduce gas cost and provide empirical analysis.
Finally, we address the problem of fair data collection from a crowd. Sharing economy platforms such as Airbnb and Uber face a major challenge in the form of peer-to-peer discrimination based on sensitive personal attributes such as race and gender. We show that how a peer-consistency incentive mechanism can be used to encourage users to go against common bias and provide a truthful rating about others, obtained through a more careful and deeper evaluation. In situations where an incentive mechanism canât be implemented, we show that a simple post-processing approach can also be used to correct bias in the reputation scores, while minimizing loss in the useful information provided by the scores. We also address the problem of fair and diverse data collection from a crowd under budget constraints. We propose a novel algorithm which maximizes the expected accuracy of the collected data, while ensuring that the errors satisfy desired notions of fairness w.r.t sensitive attributes.EPFL