Caching and Distributed Storage

Saeid Sahraei
2018
Thèse EPFL

Résumé

A simple task of storing a database or transferring it to a different point via a communication channel turns far more complex as the size of the database grows large. Limited bandwidth available for transmission plays a central role in this predicament. In two broad contexts, Content Distribution Networks (CDN) and Distributed Storage Systems (DSS), the adverse effect of the growing size of the database on the transmission bandwidth can be mitigated by exploiting additional storage units. Characterizing the optimal tradeoff between the transmission bandwidth and the storage size is the central quest to numerous works in the recent literature, including this thesis.

In a DSS, individual servers fail routinely and must be replicated by downloading data from the remaining servers, a task referred to as the repair process. To render this process of repairing failed servers more straightforward and efficient, various forms of redundancy can be introduced in the system. One of the benchmarks by which the reliability of a DSS is measured is availability, which refers to the number of disjoint sets of servers that can help to repair any failed server. We study the interaction of this parameter with the amount of traffic generated during the repair process (the repair bandwidth) and the storage size. In particular, we propose a novel DSS architecture which can achieve much smaller repair bandwidth for the same availability, compared to the state of the art.

In the context of CDNs, the network can be highly congested during certain hours of the day and almost idle at other times. This variability of traffic can be reduced by utilizing local storage units that prefetch the data while the network is idle. This approach is referred to as caching. In this thesis we analyze a CDN that has access to independent data from various content providers. We characterize the best caching strategy in terms of the aggregate peak traffic under the constraint that coding across contents from different libraries is prohibited. Furthermore we prove that under certain set of conditions this restriction is without loss of optimality.

Source officielle

https://infoscience.epfl.ch/record/254882?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Caching and Distributed Storage

Graph Chatbot

Chattez avec Graph Search

Optimising Redundancy in Distributed Sensor Networks

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

Analysis of the influence of errors in DNA-based image coding

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

Optimising Redundancy in Distributed Sensor Networks

Analysis of the influence of errors in DNA-based image coding