Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
As the volume of produced data is exponentially increasing, companies tend to rely on distributed systems to meet the surging demand for storage capacity. With the business workflows becoming more and more complex, such systems often consist of or are accessed by multiple independent, untrusted entities, which need to interact with shared data. In such scenarios, the potential conflicts of interest incentivize malicious parties to act in a dishonest way and tamper the data to their own benefit. The decentralized nature of the systems renders verifiable data integrity a strenuous but necessary task: The various parties should be able to audit changes and detect tampering when it happens. In this work, we focus on HDFS, the most common storage substrate for Big Data analytics. HDFS is vulnerable to malicious users and participating nodes and does not provide a trustful lineage mechanism, thus jeopardizing the integrity of stored data and the credibility of extracted insights. As a remedy, we present Clouseau, a blockchain-based system that provides verifiable integrity over HDFS, while it does not incur significant overhead at the critical path of read/write operations. During the demonstration, the attendees will have the chance to interact with Clouseau, corrupt data themselves, and witness how Clouseau detects malicious actions.
Brice Tanguy Alphonse Lecampion, Andreas Möri