Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Secure storage of genomic data is of great and increasing importance. The scientific community's improving ability to interpret individuals' genetic materials and the growing size of genetic database populations have been aggravating the potential consequences of data breaches. The prevalent use of passwords to generate encryption keys thus poses an especially serious problem when applied to genetic data. Weak passwords can jeopardize genetic data in the short term, but given the multi-decade lifespan of genetic data, even the use of strong passwords with conventional encryption can lead to compromise. We present a tool, called {\em GenoGuard}, for providing strong protection for genomic data both today and in the long term. GenoGuard incorporates a new theoretical framework for encryption called honey encryption (HE): it can provide information-theoretic confidentiality guarantees for encrypted data. Previously proposed HE schemes, however, can be applied to messages from, unfortunately, a very restricted set of probability distributions. Therefore, GenoGuard addresses the open problem of applying HE techniques to the highly non-uniform probability distributions that characterize sequences of genetic data. In GenoGuard, a potential adversary can attempt exhaustively to guess keys or passwords and decrypt via a brute-force attack. We prove that decryption under any key will yield a plausible genome sequence, and that GenoGuard offers an information-theoretic security guarantee against message-recovery attacks. We also explore attacks that use side information. Finally, we present an efficient and parallelized software implementation of GenoGuard.
Rachid Guerraoui, Manuel José Ribeiro Vidigueira, Gauthier Jérôme Timothée Voron, Martina Camaioni, Matteo Monti, Pierre-Louis Blaise Roman