Publication

Adaptive Query Processing on Raw Data Files

Related publications (105)

Efficient Concurrent Analytical Query Processing using Data and Workload-conscious Sharing

Panagiotis Sioulas

Analytical workloads are evolving as the number of users surges and applications that submit queries in batches become popular. However, traditional analytical databases that optimize-then-execute each query individually struggle to provide timely response ...
EPFL2023

Analytical Engines With Context-Rich Processing: Towards Efficient Next-Generation Analytics

Anastasia Ailamaki, Viktor Sanca

As modern data pipelines continue to collect, produce, and store a variety of data formats, extracting and combining value from traditional and context-rich sources such as strings, text, video, audio, and logs becomes a manual process where such formats a ...
2023

Using Cloud Functions as Accelerator for Elastic Data Analytics

Anastasia Ailamaki, Haoqiong Bian, Tiannan Sha

Cloud function (CF) services, such as AWS Lambda, have been applied as the new computing infrastructure in implementing analytical query engines. For bursty and sparse workloads, CF-based query engine is more elastic than the traditional query engines runn ...
ACM2023

Dimensionality reduction of time-series data, and systems and devices that use the resultant embeddings

Steffen Schneider

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for dimensionality reduction of time-series using contrastive learning. A method can include receiving multidimensional input time series data that includes ...
2023

Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube Engine

Christoph Koch, Peter Lindner, Zhekai Jiang, Sachin Basil John

We present Sudokube, a novel system that supports interactive speed querying on high-dimensional data using partially materialized data cubes. Given a storage budget, it judiciously chooses what projections to precompute and materialize during cube constru ...
Association for Computing Machinery2023

Graph integration of structured, semistructured and unstructured data for data journalism

Angelos Christos Anadiotis, Jingmao You

Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists ...
PERGAMON-ELSEVIER SCIENCE LTD2022

High-dimensional Data Cubes

Christoph Koch, Sachin Basil John

This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information can ...
2022

HyperLogLog: Exponentially Bad in Adversarial Settings

Mathilde Aliénor Raynal

Computing the count of distinct elements in large data sets is a common task but naive approaches are memory-expensive. The HyperLogLog (HLL) algorithm (Flajolet et al., 2007) estimates a data set's cardinality while using significantly less memory than a ...
IEEE COMPUTER SOC2022

Pixels: An Efficient Column Store for Cloud Data Lakes

Anastasia Ailamaki, Haoqiong Bian

To benefit from the cloud’s higher elasticity and price-efficiency, most modern data-lake engines support S3-like cloud object storage (COS) services as their optional or preferred underlying storage. Meanwhile, the widespread column stores, such as Parque ...
IEEE2022

Efficient GPU-accelerated Join Optimization for Complex Queries

Anastasia Ailamaki, Bikash Chandra, Srinivas Karthik Venkatesh, Riccardo Mancini, Vasileios Mageirakos

Analytics on modern data analytic and data warehouse systems often need to run large complex queries on increasingly complex database schemas. A lot of progress has been made on executing such complex queries using techniques like scale out query processin ...
IEEE2022

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.