Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
A simple task of storing a database or transferring it to a different point via a communication channel turns far more complex as the size of the database grows large. Limited bandwidth available for transmission plays a central role in this predicament. In two broad contexts, Content Distribution Networks (CDN) and Distributed Storage Systems (DSS), the adverse effect of the growing size of the database on the transmission bandwidth can be mitigated by exploiting additional storage units. Characterizing the optimal tradeoff between the transmission bandwidth and the storage size is the central quest to numerous works in the recent literature, including this thesis.
In a DSS, individual servers fail routinely and must be replicated by downloading data from the remaining servers, a task referred to as the repair process. To render this process of repairing failed servers more straightforward and efficient, various forms of redundancy can be introduced in the system. One of the benchmarks by which the reliability of a DSS is measured is availability, which refers to the number of disjoint sets of servers that can help to repair any failed server. We study the interaction of this parameter with the amount of traffic generated during the repair process (the repair bandwidth) and the storage size. In particular, we propose a novel DSS architecture which can achieve much smaller repair bandwidth for the same availability, compared to the state of the art.
In the context of CDNs, the network can be highly congested during certain hours of the day and almost idle at other times. This variability of traffic can be reduced by utilizing local storage units that prefetch the data while the network is idle. This approach is referred to as caching. In this thesis we analyze a CDN that has access to independent data from various content providers. We characterize the best caching strategy in terms of the aggregate peak traffic under the constraint that coding across contents from different libraries is prohibited. Furthermore we prove that under certain set of conditions this restriction is without loss of optimality.
Touradj Ebrahimi, Michela Testolina