This section describes 8 situations where the data warehousing effort
is destined to fail, often despite the best of intentions.
1. Focusing On Idealogy Rather Than Practicality
There are many good textbooks on data warehousing out there, and many
schools are offering data warehousing classes. Having read the textbooks
or completed a course, however, does not make a person a data warehousing
guru.
For example, I have seen someone insisting on enforcing foreign key rules
between dimension tables and fact tables, starting from the first day of
development. This is not prudent for several reasons: 1) The development
environment by its very nature means a lot of playing with data --
many updates, deletes, and inserts. Having the foreign
key constraint only makes the development effort take longer than
necessary. 2) This slows the ETL load time. 3) This constraint is
useless because when data gets loaded into the
fact table, we already have to go to the dimension table to get
the proper foreign key, thus already accomplished what a foreign key
constraint would accomplish.
2. Making The Process Unnecessarily Complicated
Data warehousing is inherently a complex enough project, and there is
no need to make it even complex.
Here is an example: The source file comes in with one fact. The person
responsible for project management insists that this fact be broken into
several different metrics during ETL. The ideal sound reasonable: To cut
down on the number of rows for the fact table, and so that the front-end
tool could generate the report quicker. Unfortunately, there are several
problems with this approach: First, the ETL became unnecessarily complex. Not
only are "case"-type statements now needed in ETL, but because the fact cannot
always be broken down into the corresponding metrics nicely due to inconsistencies
in the data, it became necessary to create a lot of logic to take care of the
exceptions. Second, it is never advisable to design the data model and the
ETL process based on what suits the front end tool the most. Third, at the
end of the day, the reports ended up having to sum back these separate metrics
back together to get what the users were truly after, meaning that all the
extra work was for naught.
3. Lack of Clear Ownership
Because data warehousing projects typically touch upon many different
departments, it is natural that the project involves multiple teams. For the
project to be successful, though, there must be clear ownership of the
project. Not clear ownership of different components of the project, but
the project itself. I have seen a case where multiple groups each own
a portion of the project. Needless to say, these projects never got
finished as quickly as they should, tended to underdeliver, had inflexbile
infrastructure (as each group would do what is best
for the group, not for the whole project). What I have seen coming out of
such projects is that it is tailor-made for finger-pointing. If something
is wrong, it's always another group's fault. At the end of the day, nobody
is responsible for anything, and it's no wonder why the project is full
of problems. Making sure one person/one group is
fully accountable for the success of the data warehousing project is
paramount in ensuring a successful project.
4. Not Understanding Proper Protocol
Whether you are working as a consultant or an internal resource,
you need to understand the organization protocol in order by build a
successful data warehouse / data mart.
I have been in a project where the team thought all the
development was done, tested, documented, migrated to the production
system, and ready to deliver by the deadline, and was ready to
celebrate with the bonus money the client promised for an on-time
delivery. One thing that was missed,
though, was that the client always requires any production
system to go through its QA group first. The project manager in this
case did not notice that. Hence, rather than delivering a project
on time and within budget, the project had to be delayed for an additional
four months before it could go online, all because project management
was not familiar with the organization protocol.
5. Not Fully Understand Project Impact Before The Project Starts
Here, I am talking about the project impact turns out to be much
smaller than anticipated. I have seen data mart efforts where
significant amount of resources were thrown into the project, and
at the completion of the project, there were only two users. This is
clearly the case where someone did not make the proper call, as these
resources clearly could have been better utilized in different projects.
6. Try To Bite Off More Than You Can Chew
This means that the project attempts to accomplish something more
grandeur than it is supposed to. There are two examples below:
There are data warehousing projects that attempt to control the
entire project -- even to the point of dictating how the source
system should be built to capture data, and exactly how data should be
captured. While the idea is noble -- often during a project, we find
that the source system data has a lot of problems, and hence it makes
sense to make sure the source system is built right -- in reality
this is not practical. First, source systems are built the way they
are for specific reasons -- and data analysis should only be one of
the concerns, not the only concern. In addition, this will lead
to a data warehousing system that is pretty in theory, but very
inflexible in reality.
In the same vein, I have seen data mart efforts where the project
owner attempts to push his own ideas to the rest of the company, and
that person instructed his team to build the system in a way that
can accomodate that possibility. Of course, what really happens is
that no one else ends up adopting his ideas, and much time and effort
were wasted.
7. Blindly Sticking To Certain Standards
I have seen cases where a concerted effort is put on ensuring that
different data marts employ the same infrastructure, from the tools
used (for example, a certain ETL tool must be used for doing ETL,
regardless of how simple that ETL process is) to the user
experience (for example, users must be able to access the same set of report
selection criteria).
This is an absurd way of building data marts. The very reason that
different data marts exist is because there are differences among them,
so insisting on making sure they all conform to a certain standard
is an exercise in futility. I have seen ETL tools blindly placed on
ETL processes that require only a series of SQL statements.
As far as the front end goes, that makes even less sense. First of all,
different projects, even though they may be very similar, are still
different. Otherwise they would belong to the same project.
Furthermore, users really do not care if their views into
different data marts have exactly the same look and feel. What they
care is whether the data is there on time, and whether the numbers are
dependable.
8. Bad Project Management
Bad project management can manifest itself in several ways, and some
of the examples listed previously illustrate the danger of bad project
management. In short, it is safe to say a bad project manager will
certain doom a project.
For data warehousing projects, the key is experience, especially
hands-on experience. This is not a job for someone who just completed
his or her MBA program, or someone who has only read through all the
data warehousing books, but has had no practical experience.