Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
As data analytics is used by an increasing number of applications, data analytics engines are required to execute workloads with increased concurrency, i.e., an increasing number of clients submitting queries. Data management systems designed for data analytics - a market dominated by column-stores - however, were initially optimized for single query execution, minimizing its response time. Hence, they do not treat concurrency as a first class citizen. In this paper, we experiment with one open-source and two commercial column-stores using the TPC-H and SSB benchmarks in a setup with an increasing number of concurrent clients submitting queries, focusing on whether the tested systems can scale up in a single node instance. The tested systems for in-memory workloads scale up, to some degree; however, when the server is saturated they fail to fully exploit the available parallelism. Further, we highlight the unpredictable response times for high concurrency.
Anastasia Ailamaki, Periklis Chrysogelos, Angelos Christos Anadiotis, Syed Mohammad Aunn Raza
David Atienza Alonso, Miguel Peon Quiros, Simone Machetti, Pasquale Davide Schiavone
Anastasia Ailamaki, Panagiotis Sioulas, Eleni Zapridou