Publication

Parallelizing Query Optimization on Shared-Nothing Architectures

Christoph Koch, Immanuel Trummer
2016
Conference paper

Abstract

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query evaluation. We show how to parallelize query optimization at a massive scale. We present algorithms for parallel query optimization in left-deep and bushy plan spaces. At optimization start, we divide the plan space for a given query into partitions of equal size that are explored in parallel by worker nodes. At the end of optimization, each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans. No synchronization or data exchange is required during the actual optimization phase. The amount of data sent over the network, at the start and at the end of optimization, as well as the complexity of serial steps within our algorithms increase only linearly in the number of workers and in the query size. The time and space complexity of optimization within one partition decreases uniformly in the number of workers. We parallelize single- and multi-objective query optimization over a cluster with 100 nodes in our experiments, using more than 250 concurrent worker threads (Spark executors). Despite high network latency and task assignment overheads, parallelization yields speedups of up to one order of magnitude for large queries whose optimization takes minutes on a single node.

Official source

https://infoscience.epfl.ch/record/219205?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Christoph Koch, Immanuel Trummer
2016
Conference paper

Abstract

Official source

https://infoscience.epfl.ch/record/219205?ln=en

About this result

Ontological neighbourhood

Computer engineering

Databases: Relational databases

High-performance computing: Parallel computing

Related concepts (35)

Related publications (71)

Related MOOCs (32)

Parallelizing Query Optimization on Shared-Nothing Architectures

Graph Chatbot

Chat with Graph Search

Fast Parallel Algorithms for Enumeration of Simple, Temporal, and Hop-Constrained Cycles

Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube Engine

Efficient Massively Parallel Join Optimization for Large Queries

Fast Parallel Algorithms for Enumeration of Simple, Temporal, and Hop-Constrained Cycles

Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube Engine

Efficient Massively Parallel Join Optimization for Large Queries