Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Direct-to-consumer genetic testing makes it possible for everyone to learn their genome sequences. In order to contribute to medical research, a growing number of people publish their genomic data on the Web, sometimes under their real identities. However, this is at odds not only with their own privacy but also with the privacy of their relatives. The genomes of relatives being highly correlated, some family members might be opposed to revealing any of the family's genomic data. In this paper, we study the trade-off between utility and privacy in genomics. We focus on the most relevant kind of variants, namely single nucleotide polymorphisms (SNPs). We take into account the fact that the SNPs of an individual contain information about the SNPs of his family members and that SNPs are correlated with each other. Furthermore, we assume that SNPs can have different utilities in medical research and different levels of sensitivity for individuals. We propose an obfuscation mechanism that enables the genomic data to be publicly available for research, while protecting the genomic privacy of the individuals in a family. Our genomic-privacy preserving mechanism relies upon combinatorial optimization and graphical models to optimize utility and meet privacy requirements. We also present an extension of the optimization algorithm to cope with the non-linear constraints induced by the correlations between SNPs. Our results on real data show that our proposed technique maximizes the utility for genomic research and satisfies family members' privacy constraints.
, , , ,