In computer programming contexts, a data cube (or datacube) is a multi-dimensional ("n-D") array of values. Typically, the term data cube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data.
The data cube is used to represent data (sometimes called facts) along some dimensions of interest.
For example, in online analytical processing (OLAP) such dimensions could be the subsidiaries a company has, the products the company offers, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In satellite image timeseries dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite (following some processing that is not of concern here).
Even though it is called a cube (and the examples provided above happen to be 3-dimensional for brevity), a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional.
In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only few values with the rest being empty, i.e. undefined, sometimes most or all cube coordinates hold a cell value. In the first case such data are called sparse, in the second case they are called dense, although there is no hard delineation between both.
Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
The course covers fundamental principles for understanding and building systems for managing and analyzing large amounts of data.
Related people (2)
,
Related concepts (1)
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.
In the current era of big data, aggregation queries on high-dimensional datasets are frequently utilized to uncover hidden patterns, trends, and correlations critical for effective business decision-making. Data cubes facilitate such queries by employing p ...
This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information can ...
This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information can ...