Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information can be quickly reconstructed using statistical or linear programming techniques. This enables new applications such as exploratory data analysis for feature engineering and other fields of data science. Moreover, it allows us to use data cubes in data warehouses and data lakes without design regret. It removes the need to compromise when building a data cube — all columns that we might ever wish to use can be included as dimensions. As a collateral benefit, our approach also speeds up certain dice, roll-up, and drill-down operations on data cubes with hierarchical dimensions compared to traditional data cubes.
Christophe Ballif, Marine Dominique C. Cauz, Laure-Emmanuelle Perret Aebi