Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
Data replication, the main failure resilience strategy used for big data analytics jobs, can be unnecessarily inefficient. It can cause serious performance degradation when applied to intermediate job outputs in multi-job computations. For instance, for I/ ...
Software development has taken a fundamental turn. Software today has gone from simple, closed programs running on a single machine, to massively open programs, patching together user experiences byway of responses received via hundreds of network requests ...
High performance computing will probably reach exascale in this decade. At this scale, mean time between failures is expected to be a few hours. Existing fault tolerant protocols for message passing applications will not be efficient anymore since they eit ...
This thesis is devoted to the design and analysis of algorithms for scheduling problems. These problems are ubiquitous in the modern world. Examples include the optimization of local transportation, managing access to concurrent resources like runways at a ...
KRAB-containing poly-zinc finger proteins (KZFPs) constitute the largest family of transcription factors encoded by mammalian genomes, and growing evidence indicates that they fulfill functions critical to both embryonic development and maintenance of adul ...
We introduce the Fixed Cluster Repair System (FCRS) as a novel architecture for Distributed Storage Systems (DSS) that achieves a small repair bandwidth while guaranteeing a high availability. Specifically, we partition the set of servers in a DSS into s c ...
This dissertation deals with the design of practical erasure codes for storage systems. Hardware and logical disk failures are a common source of system failures that may lead to data loss. Nevertheless, it is predicted that spinning disks would remain the ...
This project aims at building a dynamic and scalable framework for Byzantine fault tolerant distributed computation. By organizing nodes of a distributed system in fault-tolerant clusters, we mask Byzantine failures of single nodes and provide an abstracti ...
Distributed processing over networks relies on in-network processing and cooperation among neighboring agents. Cooperation is beneficial when agents share a common objective. However, in many applications, agents may belong to different clusters that pursu ...
This paper shows for the first time that distributed computing can be both reliable and efficient in an environment that is both highly dynamic and hostile. More specifically, we show how to maintain clusters of size O(log N), each containing more than two ...