Concept

Single point of failure

Related publications (37)

A Minimally Intrusive Low-Memory Approach to Resilience for Existing Transient Solvers

We propose a novel, minimally intrusive approach to adding fault tolerance to existing complex scientific simulation codes, used for addressing a broad range of time-dependent problems on the next generation of supercomputers. Exascale systems have the pot ...

2019

System Support for Efficient Replication in Distributed Systems

Dragos-Adrian Seredinschi

Current online applications, such as search engines, social networks, or file sharing services, execute across a distributed network of machines. They provide non-stop services to their users despite failures in the underlying network. To achieve such a hi ...

EPFL2019

Monitoring distributed fragmented skylines

Odysseas Papapetrou

Distributed skyline computation is important for a wide range of domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal is ...

2018

The Complexity of Reliable and Secure Distributed Transactions

Junxiong Wang

The use of transactions in distributed systems dates back to the 70's. The last decade has also seen the proliferation of transactional systems. In the existing transactional systems, many protocols employ a centralized approach in executing a distributed ...

EPFL2018

Size effect in shear and punching shear failures of concrete members without transverse reinforcement: Differences between statically determinate members and redundant structures

Aurelio Muttoni, Miguel Fernández Ruiz

Large efforts have been devoted in the past to understanding size effect in shear failures of members without transverse reinforcement. Experimental works have demonstrated that increasing the size reduces the nominal shear strength provided that the failu ...

2018

Sparsified SGD with Memory

Martin Jaggi, Sebastian Urban Stich, Jean-Baptiste Francis Marie Juliette Cordonnier

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders perfect scalability. ...

2018

Clock-SI: Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks

Willy Zwaenepoel, Sameh Mohamed Elnikety, Jiaqing Du

Clock-SI is a fully distributed protocol that implements snapshot isolation (SI) for partitioned data stores. It derives snapshot and commit timestamps from loosely synchronized clocks, rather than from a centralized timestamp authority as used in current ...

2013

Reliability Analysis of Data Storage Systems

Vinodh Venkatesan

Modern data storage systems are extremely large and consist of several tens or hundreds of nodes. In such systems, node failures are daily events, and safeguarding data from them poses a serious design challenge. The focus of this thesis is on the data rel ...

EPFL2012

A High-Throughput Byzantine Fault-Tolerant Protocol

Nikola Knezevic

State-machine replication (SMR) is a software technique for tolerating failures and for providing high availability in large-scale systems, through the use of commodity hardware. A replicated state-machine comprises a number of replicas, each of which runs ...

EPFL2012

A Fault-Tolerant Token-Based Atomic Broadcast Algorithm

André Schiper, Nils Richard Ekwall

Many atomic broadcast algorithms have been published in the last 20 years. Token-based algorithms represent a large class of these algorithms. Interestingly, all the token-based atomic broadcast algorithms rely on a group membership service and none of the ...

2011

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.