Publication

Axo: Detection and Recovery for Delay and Crash Faults in Real-Time Control Systems

Abstract

Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of the controlled resources. Recently, Axo, a protocol for masking crash and delay faults by replicating the controller, was proposed. Axo provides safety by discarding delayed setpoints, and it relies on the presence of valid setpoints for providing availability. To ensure that enough valid setpoints are issued, faulty controller replicas need to be detected and recovered. We present a mechanism for detection and recovery of delay- and crash-faulty replicas under the Axo framework. These mechanisms were designed to be soft state (i.e., their state can be reconstructed from received messages) to enable seamless additions of new replicas. Besides presenting the design, we analytically characterize the time to detect and recover a faulty replica, and we validate them experimentally. We demonstrate the performance of Axo by using two case studies: the first provides a stability analysis of an inverted pendulum system with Axo, and the second shows the fault-tolerance performance of Axo through a deployment on a real-time control system that controls a CIGRE low-voltage benchmark microgrid.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (20)
Page fault
In computing, a page fault (sometimes called PF or hard fault) is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Besides, the actual page contents may need to be loaded from a backing store, such as a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.
Real-time operating system
A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. An RTOS is distinct from a time-sharing operating system, such as Unix, which manages the sharing of system resources with a scheduler, data buffers, or fixed task prioritization in a multitasking or multiprogramming environment. Processing time requirements need to be fully understood and bound rather than just kept as a minimum.
Real-time computing
Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines". Real-time responses are often understood to be in the order of milliseconds, and sometimes microseconds. A system not specified as operating in real time cannot usually guarantee a response within any timeframe, although typical or expected response times may be given.
Show more
Related publications (29)

Real-time implementation of the high-fidelity NBI code RABBIT into the discharge control system of ASDEX Upgrade

Federico Alberto Alfredo Felici, Bernhard Sieglin

For the first time, a real-time capable NBI code, which has a comparable fidelity to the much more computationally expensive Monte Carlo codes such as NUBEAM, has been coupled to the discharge control system of a tokamak. This implementation has been done ...
IOP Publishing Ltd2023

Real-Time Prediction of Students' Activity Progress and Completion Rates

Pierre Dillenbourg, Jennifer Kaitlyn Olsen, Louis Pierre Faucon, Stian Haklev

In classrooms, some transitions between activities impose (quasi-)synchronicity, meaning there is a need for learners to move between activities at the same time. To make real-time decisions about when to move to the next activity, teachers need to be able ...
SOC LEARNING ANALYTICS RESEARCH-SOLAR2020

Reliable and Robust Cyber-Physical Systems for Real-Time Control of Electric Grids

Wajeb Saab

Real-time control of electric grids is a novel approach to handling the increasing penetration of distributed and volatile energy generation brought about by renewables. Such control occurs in cyber-physical systems (CPSs), in which software agents maintai ...
EPFL2019
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.