In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration. Data transformation can be simple or complex based on the required changes to the data between the source (initial) data and the target (final) data. Data transformation is typically performed via a mixture of manual and automated steps. Tools and technologies used for data transformation can vary widely based on the format, structure, complexity, and volume of the data being transformed. A master data recast is another form of data transformation where the entire database of data values is transformed or recast without extracting the data from the database. All data in a well designed database is directly or indirectly related to a limited set of master database tables by a network of foreign key constraints. Each foreign key constraint is dependent upon a unique database index from the parent database table. Therefore, when the proper master database table is recast with a different unique index, the directly and indirectly related data are also recast or restated. The directly and indirectly related data may also still be viewed in the original form since the original unique index still exists with the master data. Also, the database recast must be done in such a way as to not impact the applications architecture software. When the data mapping is indirect via a mediating data model, the process is also called data mediation. Data transformation can be divided into the following steps, each applicable as needed based on the complexity of the transformation required. Data discovery Data mapping Code generation Code execution Data review These steps are often the focus of developers or technical data analysts who may use multiple specialized tools to perform their tasks.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (17)
CS-401: Applied data analysis
This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the dat
COM-490: Large-scale data science for real-world data
This hands-on course teaches the tools & methods used by data scientists, from researching solutions to scaling up prototypes to Spark clusters. It exposes the students to the entire data science pipe
MATH-517: Statistical computation and visualisation
The course will provide the opportunity to tackle real world problems requiring advanced computational skills and visualisation techniques to complement statistical thinking. Students will practice pr
Show more
Related lectures (39)
Data Wrangling and Analysis
Covers a homework assignment on data wrangling and analysis using Python's pandas library for real-world datasets.
Entity Resolution Techniques
Explores entity resolution techniques, data deduplication, similarity metrics, computational cost, blocking techniques, and scaling out similarity joins.
Data Modeling: Concepts and Applications
Explores data modeling concepts, SQL implementations, and practical applications in handling missing data.
Show more
Related publications (120)

Data Transformation in the Processing of Neuronal Signals: A Powerful Tool to Illuminate Informative Contents

Mohammad Ali Shaeri

Neuroscientists seek efficient solutions for deciphering the sophisticated unknowns of the brain. Effective development of complicated brain-related tools is the focal point of research in neuroscience and neurotechnology. Thanks to today's technological a ...
2023

Scalable and Privacy-Preserving Federated Principal Component Analysis

Jean-Pierre Hubaux, Juan Ramón Troncoso-Pastoriza, Jean-Philippe Léonard Bossuat, Apostolos Pyrgelis, David Jules Froelicher, Joao André Gomes de Sá e Sousa

Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confi ...
IEEE COMPUTER SOC2023
Show more
Related concepts (9)
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories, for example) domains. Data integration appears with increasing frequency as the volume (that is, big data) and the need to share existing data explodes.
Master data management
Master data management (MDM) is a technology-enabled discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets. Organisations, or groups of organisations, may establish the need for master data management when they hold more than one copy of data about a business entity. Holding more than one copy of this master data inherently means that there is an inefficiency in maintaining a "single version of the truth" across all copies.
Data cleansing
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting or a data quality firewall. After cleansing, a data set should be consistent with other similar data sets in the system.
Show more