Data Warehousing >
Processes >
ETL
Task Description
The ETL (Extraction, Transformation, Loading) process typically takes the longest to develop,
and this can easily take up to 50% of the data warehouse implementation cycle or longer.
The reason for this is that it takes time to get the source data, understand the necessary
columns, understand the business rules, and understand the logical and physical data models.
Time Requirement
1 - 6 weeks.
Deliverables
- Data Mapping Document
- ETL Script / ETL Package in the ETL tool
Possible Pitfalls
There is a tendency to give this particular phase too little development time. This can prove
suicidal to the project because end users will usually tolerate less formatting, longer time to
run reports, less functionality (slicing and dicing), or fewer delivered reports; one thing that
they will not tolerate is wrong information.
A second common problem is that some people make the ETL process more
complicated than necessary. In ETL design, the primary goal should be to
optimize load speed without sacrificing on quality. This is, however,
sometimes not followed. There are cases where the design goal is to cover
all possible future uses, whether they are practical or just a figment of
someone's imagination. When this happens, ETL performance suffers, and
often so does the performance of the entire data warehousing system.