Extract Transform Load (ETL) is a process in which data from several data sources, which may be structured differently, are combined in a
target database. Extraction of the relevant data from different sources Transformation of the data into the schema and format of the
target database Load. In other words. Different data sources can be extracted via an ETL process and thus prepared for integration into a
data warehouse. source: Wikipedia
These are the three main phases of the ETL process
Extraction: The extraction is the first step of the ETL process. This is where the data is selected in various source systems and
prepared for the transformation phase. In most cases, the process extracts only partial areas from individual source databases.
Extractions take place on a regular basis in order to continuously supply the data warehouse with updated data. Event-driven or
request-driven extractions are also possible.
Transformation: The extraction is followed by the transformation phase. The delivered data is adapted to the format and schema of the target database. The transformation process again passes through several individual steps. These individual steps can be, for example, the following:
- Defining basic aspects of formatting
- Correction of corrupted data
- Check for similar information and any data duplicates with subsequent deletion and exclusion of such data
- Group, sort and aggregate the data
- final adaptation to target formats and target schemas
Load: The third and final step is to load the previously checked and enriched data. In this step, the actual integration into the
target database or data warehouse takes place. The data is physically moved to its target without blocking the database for too long while
loading. The integrity of the loaded data must be ensured. All changes in the target system are documented by detailed protocolling and
logging. Via logging it is possible to restore old data statuses if required.
- Data storage in a data warehouse
- Data provision for BI applications
- Data extraction from distributed database environments or cloud based databases
- Migration of data between different applications
- Replication of data for backup and redundancy purposes