Data cube

In computer programming contexts, a data cube (or datacube) is a multi-dimensional ("n-D") array of values. Typically, the term data cube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data. The data cube is used to represent data (sometimes called facts) along some dimensions of interest. For example, in online analytical processing (OLAP) such dimensions could be the subsidiaries a company has, the products the company offers, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In satellite image timeseries dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite (following some processing that is not of concern here). Even though it is called a cube (and the examples provided above happen to be 3-dimensional for brevity), a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only few values with the rest being empty, i.e. undefined, sometimes most or all cube coordinates hold a cell value. In the first case such data are called sparse, in the second case they are called dense, although there is no hard delineation between both. Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running.

High-dimensional Data Cubes

Christoph Koch, Sachin Basil John

This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information can ...

ASSOC COMPUTING MACHINERY2022

High-dimensional Data Cubes

Christoph Koch, Sachin Basil John

ASSOC COMPUTING MACHINERY2022

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

High-dimensional Data Cubes

High-dimensional Data Cubes

Graph Chatbot

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

High-dimensional Data Cubes

High-dimensional Data Cubes