Publication

Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data Centers

Sanidhya Kashyap, Qin Zhang
2023
Conference paper
Abstract

Data-intensive systems are the backbone of today's computing and are responsible for shaping data centers. Over the years, cloud providers have relied on three principles to maintain cost-effective data systems: use disaggregation to decouple scaling, use domain-specific computing to battle waning laws, and use serverless to lower costs. Although they work well individually, they fail to work in harmony: an issue amplified by emerging data system workloads. In this paper, we envision a distributed runtime to mitigate current shortcomings. The distributed runtime has a tiered access layer exposing declarative APIs, underpinned by a stateful serverless runtime with a distributed task execution model. It will be the narrow waist between data systems and hardware. Users are oblivious to data location, concurrency, disaggregation style, or even the hardware to do the computing. The underlying stateful serverless runtime transparently evolves with novel data-center architectures, such as disaggregation and tightly-coupled clusters. We prototype Skadi to showcase that the distributed runtime is practical.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (20)
Big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe big data is the one associated with a large body of information that we could not comprehend when used only in smaller amounts.
Data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.
Computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The components of a cluster are usually connected to each other through fast local area networks, with each node (computer used as a server) running its own instance of an operating system. In most circumstances, all of the nodes use the same hardware and the same operating system, although in some setups (e.
Show more
Related publications (32)

Special Session: Challenges and Opportunities for Sustainable Multi-Scale Computing Systems

David Atienza Alonso, Miguel Peon Quiros

Multi-Scale computing systems aim at bringing the computing as close as possible to the data sources, to optimize both computation and networking. These systems are composed of at least three computing layers: the terminal layer, the edge layer, and the cl ...
ACM2023

Next-generation brain observatories

Mackenzie Mathis, Shreya Saxena

We propose centralized brain observatories for large-scale recordings of neural activity in mice and non-human primates coupled with cloud-based data analysis and sharing. Such observatories will advance reproducible systems neuroscience and democratize ac ...
CELL PRESS2022

Clouseau: Blockchain-based Data Integrity for HDFS Clusters

Ioannis Mytilinis

As the volume of produced data is exponentially increasing, companies tend to rely on distributed systems to meet the surging demand for storage capacity. With the business workflows becoming more and more complex, such systems often consist of or are acce ...
IEEE COMPUTER SOC2021
Show more
Related MOOCs (7)
IoT Systems and Industrial Applications with Design Thinking
The first MOOC to provide a comprehensive introduction to Internet of Things (IoT) including the fundamental business aspects needed to define IoT related products.
Show more