Dalton: Learned Partitioning for Distributed Data Streams

Anastasia Ailamaki, Eleni Zapridou, Ioannis Mytilinis
2022
Conference paper

Abstract

To sustain the input rate of high-throughput streams, modern stream processing systems rely on parallel execution. However, skewed data yield imbalanced load assignments and create stragglers that hinder scalability. Deciding on a static partitioning for a given set of “hot” keys is not sufficient as these keys are not known in advance, and even worse, the data distribution can change unpredictably. Existing algorithms either optimize for a specific distribution or, in order to adapt, assume a centralized partitioner that processes every incoming tuple and observes the whole workload. However, this is not realistic in a distributed environment, where multiple parallel upstream operators exist, as the centralized partitioner itself becomes the bottleneck and limits scalability. In this work, we propose Dalton: a lightweight, adaptive, yet scalable partitioning operator that relies on reinforcement learning. By memoizing state and dynamically keeping track of recent experience, Dalton: i) adjusts its policy at runtime and quickly adapts to the workload, ii) avoids redundant computations and minimizes the per-tuple partitioning overhead, and iii) efficiently scales out to multiple instances that learn cooperatively and converge to a joint policy. Our experiments indicate that Dalton scales regardless of the input data distribution and sustains 1.3× - 6.7× higher throughput than existing approaches.

Official source

https://infoscience.epfl.ch/record/299209?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Dalton: Learned Partitioning for Distributed Data Streams

Graph Chatbot

Chat with Graph Search

A story of two transitions: From adhesive to abrasive wear and from ductile to brittle regime

Extensions of Peer Prediction Incentive Mechanisms

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

Compilation and Design Space Exploration of Dataflow Programs for Heterogeneous CPU-GPU Platforms

A story of two transitions: From adhesive to abrasive wear and from ductile to brittle regime

Extensions of Peer Prediction Incentive Mechanisms