Recipes for Data Warehousing Project Failure
This section describes 8 situations where the data warehousing effort is destined to fail, often despite the best of intentions.
1. Focusing On Ideology Rather Than Practicality
There are many good textbooks on data warehousing out there, and many schools are offering data warehousing classes. Having read the textbooks or completed a course, however, does not make a person a data warehousing guru.
For example, I have seen someone insisting on enforcing foreign key rules between dimension tables and fact tables, starting from the first day of development. This is not prudent for several reasons: 1) The development environment by its very nature means a lot of playing with data -- many updates, deletes, and inserts. Having the foreign key constraint only makes the development effort take longer than necessary. 2) This slows the ETL load time. 3) This constraint is useless because when data gets loaded into the fact table, we already have to go to the dimension table to get the proper foreign key, thus already accomplished what a foreign key constraint would accomplish.
2. Making The Process Unnecessarily Complicated
Data warehousing is inherently a complex enough project, and there is no need to make it even complex.
Here is an example: The source file comes in with one fact. The person responsible for project management insists that this fact be broken into several different metrics during ETL. The ideal sound reasonable: To cut down on the number of rows for the fact table, and so that the front-end tool could generate the report quicker. Unfortunately, there are several problems with this approach: First, the ETL became unnecessarily complex. Not only are "case"-type statements now needed in ETL, but because the fact cannot always be broken down into the corresponding metrics nicely due to inconsistencies in the data, it became necessary to create a lot of logic to take care of the exceptions. Second, it is never advisable to design the data model and the ETL process based on what suits the front end tool the most. Third, at the end of the day, the reports ended up having to sum back these separate metrics back together to get what the users were truly after, meaning that all the extra work was for naught.
3. Lack of Clear Ownership
Because data warehousing projects typically touch upon many different departments, it is natural that the project involves multiple teams. For the project to be successful, though, there must be clear ownership of the project. Not clear ownership of different components of the project, but the project itself. I have seen a case where multiple groups each own a portion of the project. Needless to say, these projects never got finished as quickly as they should, tended to underdeliver, had inflexible infrastructure (as each group would do what is best for the group, not for the whole project). What I have seen coming out of such projects is that it is tailor-made for finger-pointing. If something is wrong, it's always another group's fault. At the end of the day, nobody is responsible for anything, and it's no wonder why the project is full of problems. Making sure one person/one group is fully accountable for the success of the data warehousing project is paramount in ensuring a successful project.
4. Not Understanding Proper Protocol
Whether you are working as a consultant or an internal resource, you need to understand the organization protocol in order by build a successful data warehouse / data mart.
I have been in a project where the team thought all the development was done, tested, documented, migrated to the production system, and ready to deliver by the deadline, and was ready to celebrate with the bonus money the client promised for an on-time delivery. One thing that was missed, though, was that the client always requires any production system to go through its QA group first. The project manager in this case did not notice that. Hence, rather than delivering a project on time and within budget, the project had to be delayed for an additional four months before it could go online, all because project management was not familiar with the organization protocol.
5. Not Fully Understand Project Impact Before The Project Starts
Here, I am talking about the project impact turns out to be much smaller than anticipated. I have seen data mart efforts where significant amount of resources were thrown into the project, and at the completion of the project, there were only two users. This is clearly the case where someone did not make the proper call, as these resources clearly could have been better utilized in different projects.
6. Try To Bite Off More Than You Can Chew
This means that the project attempts to accomplish something more grandeur than it is supposed to. There are two examples below:
There are data warehousing projects that attempt to control the entire project -- even to the point of dictating how the source system should be built to capture data, and exactly how data should be captured. While the idea is noble -- often during a project, we find that the source system data has a lot of problems, and hence it makes sense to make sure the source system is built right -- in reality this is not practical. First, source systems are built the way they are for specific reasons -- and data analysis should only be one of the concerns, not the only concern. In addition, this will lead to a data warehousing system that is pretty in theory, but very inflexible in reality.
In the same vein, I have seen data mart efforts where the project owner attempts to push his own ideas to the rest of the company, and that person instructed his team to build the system in a way that can accommodate that possibility. Of course, what really happens is that no one else ends up adopting his ideas, and much time and effort were wasted.
7. Blindly Sticking To Certain Standards
I have seen cases where a concerted effort is put on ensuring that different data marts employ the same infrastructure, from the tools used (for example, a certain ETL tool must be used for doing ETL, regardless of how simple that ETL process is) to the user experience (for example, users must be able to access the same set of report selection criteria).
This is an absurd way of building data marts. The very reason that different data marts exist is because there are differences among them, so insisting on making sure they all conform to a certain standard is an exercise in futility. I have seen ETL tools blindly placed on ETL processes that require only a series of SQL statements.
As far as the front end goes, that makes even less sense. First of all, different projects, even though they may be very similar, are still different. Otherwise they would belong to the same project. Furthermore, users really do not care if their views into different data marts have exactly the same look and feel. What they care is whether the data is there on time, and whether the numbers are dependable.
8. Bad Project Management
Bad project management can manifest itself in several ways, and some of the examples listed previously illustrate the danger of bad project management. In short, it is safe to say a bad project manager will certain doom a project.
For data warehousing projects, the key is experience, especially hands-on experience. This is not a job for someone who just completed his or her MBA program, or someone who has only read through all the data warehousing books, but has had no practical experience.