Dalton: Learned Partitioning for Distributed Data Streams

Anastasia Ailamaki, Eleni Zapridou, Ioannis Mytilinis
2022
Article de conférence

Résumé

To sustain the input rate of high-throughput streams, modern stream processing systems rely on parallel execution. However, skewed data yield imbalanced load assignments and create stragglers that hinder scalability. Deciding on a static partitioning for a given set of “hot” keys is not sufficient as these keys are not known in advance, and even worse, the data distribution can change unpredictably. Existing algorithms either optimize for a specific distribution or, in order to adapt, assume a centralized partitioner that processes every incoming tuple and observes the whole workload. However, this is not realistic in a distributed environment, where multiple parallel upstream operators exist, as the centralized partitioner itself becomes the bottleneck and limits scalability. In this work, we propose Dalton: a lightweight, adaptive, yet scalable partitioning operator that relies on reinforcement learning. By memoizing state and dynamically keeping track of recent experience, Dalton: i) adjusts its policy at runtime and quickly adapts to the workload, ii) avoids redundant computations and minimizes the per-tuple partitioning overhead, and iii) efficiently scales out to multiple instances that learn cooperatively and converge to a joint policy. Our experiments indicate that Dalton scales regardless of the input data distribution and sustains 1.3× - 6.7× higher throughput than existing approaches.

Source officielle

https://infoscience.epfl.ch/record/299209?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Dalton: Learned Partitioning for Distributed Data Streams

Graph Chatbot

Chattez avec Graph Search

Feature distribution learning by passive exposure

Deep Domain Adaptation in Earth Observation

Efficient ensemble summaries are inversely related to visual crowding

Efficient ensemble summaries are inversely related to visual crowding

Deep Domain Adaptation in Earth Observation

Feature distribution learning by passive exposure