Concept

Dimension (data warehouse)

A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time. (Note: People and time sometimes are not modeled as dimensions.) In a data warehouse, dimensions provide structured labeling information to otherwise unordered numeric measures. The dimension is a data set composed of individual, non-overlapping data elements. The primary functions of dimensions are threefold: to provide filtering, grouping and labelling. These functions are often described as "slice and dice". A common data warehouse example involves sales as the measure, with customer and product as dimensions. In each sale a customer buys a product. The data can be sliced by removing all customers except for a group under study, and then diced by grouping by product. A dimensional data element is similar to a categorical variable in statistics. Typically dimensions in a data warehouse are organized internally into one or more hierarchies. "Date" is a common dimension, with several possible hierarchies: "Days (are grouped into) Months (which are grouped into) Years", "Days (are grouped into) Weeks (which are grouped into) Years" "Days (are grouped into) Months (which are grouped into) Quarters (which are grouped into) Years" etc. A slowly changing dimension is a set of data attributes that change slowly over a period of time rather than changing regularly e.g. address or name. These attributes can change over a period of time and that will get combined as a slowly changing dimension. These dimension can be classified in types: Type 0 (Retain original): Attributes never change. No history. Type 1 (Overwrite): Old values are overwritten with new values for attribute. No history. Type 2 (Add new row): A new row is created with either a start date / end date or a version for a new value. This creates history. Type 3 (Add new attribute): A new column is created for a new value. History is limited to the number of columns designated for storing historical data.

Source officielle

https://en.wikipedia.org/wiki/Dimension_(data_warehouse)

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Cours associés (1)

CS-422: Database systems

This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students

Concepts associés (2)

Fact table

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema. A fact table typically has two types of columns: those that contain facts and those that are a foreign key to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.

Extract-transform-load

Extract-transform-load est une technologie informatique intergicielle permettant d'effectuer des synchronisations massives d'information d'une source de données (le plus souvent une base de données) vers une autre. Cette technologie est connue sous le sigle ETL, ou extracto-chargeur. Selon le contexte, il s'agit d'exploiter différentes fonctions, souvent combinées entre elles : « extraction », « transformation », « constitution » ou « conversion », « alimentation » ou « chargement ».

Source officielle

https://en.wikipedia.org/wiki/Dimension_(data_warehouse)

À propos de ce résultat

Cours associés (1)

CS-422: Database systems

This course is intended for students who want to understand modern large-scale data analysis systems and database systems. It covers a wide range of topics and technologies, and will prepare students

Séances de cours associées (19)

OLAP: Vue d'ensemble et questions

Introduit des concepts, des schémas, des requêtes et des optimisations OLAP pour une analyse efficace des données.

Entrepôts de données et systèmes d'aide à la décision

Explore les entrepôts de données, les systèmes d'aide à la décision, OLAP, les lacs de données, les modèles de données multidimensionnels et les optimisations de requêtes.

Entreposage des données et aide à la décision

Explore l'entreposage des données, les systèmes d'aide à la décision et l'importance des statistiques dans l'analyse des données.

Afficher plus

Publications associées (15)

Interactive-time Exploration, Querying, and Analysis of Large High-dimensional Datasets

Sachin Basil John

In the current era of big data, aggregation queries on high-dimensional datasets are frequently utilized to uncover hidden patterns, trends, and correlations critical for effective business decision-making. Data cubes facilitate such queries by employing p ...

EPFL2023

Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube Engine

Christoph Koch, Sachin Basil John, Zhekai Jiang, Peter Lindner

We present Sudokube, a novel system that supports interactive speed querying on high-dimensional data using partially materialized data cubes. Given a storage budget, it judiciously chooses what projections to precompute and materialize during cube constru ...

Association for Computing Machinery2023

High-dimensional Data Cubes

Christoph Koch, Sachin Basil John

This paper introduces an approach to supporting high-dimensional data cubes at interactive query speeds and moderate storage cost. The approach is based on binary(-domain) data cubes that are judiciously partially materialized; the missing information can ...

2022

Afficher plus

Concepts associés (2)

Fact table

Extract-transform-load