Publication

LAQy: Efficient and Reusable Query Approximations via Lazy Sampling

Abstract

Modern analytical engines rely on Approximate Query Processing (AQP) to provide faster response times than the hardware allows for exact query answering. However, existing AQP methods impose steep performance penalties as workload unpredictability increases. Specifically, offline AQP relies on predictable workloads to create samples that match the queries in a priori to query execution, providing reductions in query response times when queries match the expected workload. As soon as workload predictability diminishes, existing online AQP methods create query-specific samples with little reuse across queries and produce significantly smaller gains in response times. As a result, existing approaches cannot fully exploit the benefits of sampling under increased unpredictability. We analyze sample creation and propose LAQy, a framework for building, expanding, and merging samples to adapt to the changes in workload predicates. We show the main parameters that affect the sample creation time and propose lazy sampling to overcome the unpredictability issues that cause fast-but-specialized samples to be query-specific. We evaluate LAQy by implementing it in an in-memory code-generation-based scale-up analytical engine to show the adaptivity and practicality of our framework in a modern system. LAQy speeds up online sampling processing as a function of sample reuse ranging from practically zero to full online sampling time, and from 2.5x to 19.3x in a simulated exploratory workload.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (27)
Query optimization
Query optimization is a feature of many relational database management systems and other databases such as NoSQL and graph databases. The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans. Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to the database server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs.
Analytical engine
The analytical engine was a proposed mechanical general-purpose computer designed by English mathematician and computer pioneer Charles Babbage. It was first described in 1837 as the successor to Babbage's difference engine, which was a design for a simpler mechanical calculator. The analytical engine incorporated an arithmetic logic unit, control flow in the form of conditional branching and loops, and integrated memory, making it the first design for a general-purpose computer that could be described in modern terms as Turing-complete.
Query plan
A query plan (or query execution plan) is a sequence of steps used to access data in a SQL relational database management system. This is a specific case of the relational model concept of access plans. Since SQL is declarative, there are typically many alternative ways to execute a given query, with widely varying performance. When a query is submitted to the database, the query optimizer evaluates some of the different, correct possible plans for executing the query and returns what it considers the best option.
Show more
Related publications (32)

Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube Engine

Christoph Koch, Peter Lindner, Zhekai Jiang, Sachin Basil John

We present Sudokube, a novel system that supports interactive speed querying on high-dimensional data using partially materialized data cubes. Given a storage budget, it judiciously chooses what projections to precompute and materialize during cube constru ...
Association for Computing Machinery2023

Center-aware Adversarial Augmentation for Single Domain Generalization

Mathieu Salzmann, Zhiye Wang

Domain generalization (DG) aims to learn a model from multiple training (i.e., source) domains that can generalize well to the unseen test (i.e., target) data coming from a different distribution. Single domain generalization (SingleDG) has recently emerge ...
IEEE COMPUTER SOC2023

Sampling-Based AQP in Modern Analytical Engines

Anastasia Ailamaki, Viktor Sanca

As the data volume grows, reducing the query execution times remains an elusive goal. While approximate query processing (AQP) techniques present a principled method to trade off accuracy for faster queries in analytics, the sample creation is often consid ...
ACM2022
Show more
Related MOOCs (2)
Geographical Information Systems 1
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette
Geographical Information Systems 1
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.