Publication

Efficient Concurrent Analytical Query Processing using Data and Workload-conscious Sharing

Panagiotis Sioulas
2023
Thèse EPFL
Résumé

Analytical workloads are evolving as the number of users surges and applications that submit queries in batches become popular. However, traditional analytical databases that optimize-then-execute each query individually struggle to provide timely responses under high concurrency even when tuning databases for the target data and workload: response time is increased as a function of concurrency. Thus, high concurrency jeopardizes the interactivity, hence also the usability, of real-time applications.Work-sharing databases reduce the total processing time across queries. However, existing planning, execution, and database-tuning strategies for work-sharing databases suffer from redundant processing in their chosen operator orders, data access methods, and techniques for handling recomputation. For real-time workloads that are ad-hoc, selective, or recurring, redundant processing results in performance bottlenecks that put timeliness at risk.This thesis addresses the inefficiency in choosing operator orders, accessing data, and reusing materialized results in work-sharing databases. It introduces planning, execution, and database-tuning strategies that reduce redundant processing by holistically optimizing plans for the characteristics of the data, the query batches, and the workload patterns. Our goal is to enable timeliness for processing highly concurrent workloads using work sharing.To choose efficient operator orders, we propose RouLette, a specialized engine that optimizes and processes Select-Project-Join queries using runtime adaptation. RouLette incrementally explores alternative plans with different sharing opportunities by combining reinforcement learning with feedback from monitoring the execution of explored plans. Also, RouLette introduces optimizations that reduce the overhead of adaptive execution. Thus, it both explores more candidate plans and evaluates them more accurately while maintaining low overhead.To reduce processing time for data access and filtering when processing a batch of queries, we introduce SH2O. SH2O focuses on efficiency and scalability. First, it identifies multidimensional data regions where filtering decisions are invariant and uses spatial indices to perform shared access to the regions. Second, to mitigate overhead when the number of regions is large, SH2O employs two optimization strategies that: i) choose which filters to replace in a cost-based manner, ii) specialize for the different data properties and indices across partitions.Finally, we propose ParCuR, a framework for effectively and efficiently reusing materialized results in work-sharing databases. ParCuR optimizes reuse across three axes: i) to address interference between work sharing and reuse, it introduces sharing-aware materialization and reuse policies, ii) to reduce the overhead for reuse, it builds access methods that eliminate frequent predicates, and iii) to maximize the usability of materialized results, it introduces a hybrid partitioning-materialization scheme that enables partial reuse while making efficient use of the available storage budget.All in all, this thesis redesigns work-sharing databases by specializing shared execution for the target data, query batches, and workload patterns. It significantly expands the applicability of work-sharing databases, enabling them to provide timely responses to real-time, highly concurrent applications that produce unpredictable, selective, and recurring workloads.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.