Magazine Article | December 1, 2001

The Drill-Down On Data Warehousing

Source: Field Technologies Magazine

If you're not analyzing all the data collected by your ERP (enterprise resource planning) and CRM (customer relationship management) solutions, it might as well be collecting dust on a shelf.

Integrated Solutions, December 2001

The English language is full of peculiar words that sound like they should mean one thing and actually mean something else. For instance, ever wonder why you park in a driveway and drive on a parkway? When you apply this concept to technology, these terms become more confusing than they are amusing. One technology that seems to take the cake on word usage is data warehousing. When you think of a warehouse, for instance, what are some visuals that come to mind? "When most people think of data warehousing, they think of huge backup storage databases," says Benjamin Taub, president of Dataspace Inc. (Ann Arbor, MI). "Data warehousing is the technology necessary to gather data from disparate systems, integrate it into one place, and make it available for end users."

Let's break down Taub's working definition of data warehousing a few steps further. The first part is gathering data from disparate systems. Even small- to medium-sized enterprises often use applications that either run on disparate platforms (Windows NT and UNIX) or run on disparate databases (DB2 and Oracle). If a salesperson, for instance, wants to link customer information from the sales database (NT-based) with accounting's database (UNIX-based) to get a better picture of his customers, he has to call the IT person. The IT person has to export the data from the accounting database into a SQL-friendly application and then link the data to the SQL database.

The problem with this is threefold: First, the accounting data is no longer live, which means that it loses relevance each day and has to be manually refreshed every time the salesperson wants to do an analysis. The second problem is that if several salespeople are making requests, the IT person can find himself making a full-time job out of manually exporting data and linking tables within a database. The third problem is that the salesperson will typically discover that accounting's data doesn't always match up nicely with his data, which makes for "dirty data," and inaccurate query and reporting results.

Thus, the debate begins: Should accounting be expected to change "W. Smith" in its database to "William Smith" or should sales change "Bill Smith" in its database to "W." or "William?" Multiply this by thousands of records and multiple requesters, and you begin to see the enormity of the task at hand.

The second part of the data warehouse definition is integrating data into one place. This part of the data warehousing process involves specialized tools know as ETL (extract, transform, load). These tools take all the disparate data, which resides in databases using disparate programming languages, transforms it into a common language, and copies it into a single, integrated place. One benefit of using a data warehouse to accomplish this is that the data warehousing software can be set to refresh itself on a regular basis.

The third part of the data warehouse is to make data available for end users. "The reporting aspect of data warehousing is where business intelligence comes into play," says Taub. "Business intelligence incorporates reporting and analysis tools to create predictive models to make retailers, for instance, aware of the correlation between purchases of diapers and purchases of beer. Once the retailers learn this, they can move the two products closer together in the store." When data warehousing techniques are applied to CRM (customer relationship management), the result is often called analytical CRM. As the name suggests, the purpose of analytical CRM is to be able to make educated business decisions that will be used to retain current customers and attract new ones.

Do You Need A Data Warehouse?
Now that we've established a basic working definition of data warehousing, let's next take a look at who needs it. If you think that the answer is everyone, you're wrong. While it's not always a black-and-white decision, there are a few factors to consider. "The decision to choose a data warehouse solution usually comes from important questions that can't be answered with the current enterprise systems," says Andrew Manby, director of marketing for Ascential Software (Westboro, MA). "Some of the questions raised are: 'Why can't I see an integrated view of my customers across all my applications?' or 'Why is revenue dropping in this particular department?'" Not having answers to these questions, however, is not a sure sign you need a data warehouse.

If you find yourself asking these questions, you may be able to get the data you need by attaching an OLAP (online analytical processing) tool to the operational system or by building a data mart (see "How To Speak Analytical CRM" on p. 62). Deciding how much data you want to look at and what kind of trends you want to capture will most likely solve this dilemma. An international business that wants to capture the sales of a particular product by a certain type of customer in Europe and the United States over the last few years will need a data warehouse to be able to get those results. The benefit of using a data warehouse in this example is that it is nonvolatile (i.e. read-only) and therefore won't slow down or crash the production databases during its data processing.

The Business Benefits Of Data Warehouses
Like any major IT investment, knowing how the investment affects the bottom line is where the real deciding factor lies. "Companies have expectations of an investment payback within one year," says Stan Prescott, analyst for Frost & Sullivan (New York). "If a large wireless telecommunications company, for instance, can reduce its churn rate by 1% per year by better profiling its customers, it can realize an annual savings of more than $1 million." The key to getting these kinds of results from a data warehouse is realizing ahead of time how much it costs not to have a better understanding of your customers or even your enterprise data in general. If an enterprise cannot estimate the value of more rapid access to data, more reliable reporting, and more flexible data representation, it shouldn't build a data warehouse. Enterprises should, however, be aware that a study conducted by IDC, which incorporated information gathered from more than 60 organizations that had implemented data warehouses, revealed that the average data warehouse produced an ROI of 401% over the course of four years.

"Field Of Dreams" Mentality Yields Data Warehouse Disaster
If you build a data warehouse, pay special attention to a few common pitfalls. Many data warehouse vendors can tell horror stories of companies that have invested $2 million in a data warehouse that no one uses. "The first pitfall companies fall into is having the 'If you build it, they will come' attitude," says Taub. "Those data warehouses fail because the companies did not establish how they were going to use the data." This typically reveals a disconnect between the IT department and the business executives. "A large data warehouse isn't always a sign of IT virility; it's a sign of a lack of planning and poor decision making," says Andy Hayler, CEO of Kalido (Houston). "Trying to answer every question under the sun creates analysis paralysis." A better strategy is to treat the data with skepticism. What does it mean? How does it benefit the business?

The second pitfall for implementing a data warehouse solution is to prejudge the solution. "Some businesses think that, if the data warehouse produces bad data, it's the data warehouse's fault," says Hayler. "More often than not it's a sign that the business needs to do some data cleansing." Data can be dirty for three reasons: format violations, referential integrity violations, and matching violations. The first problem usually means letters were entered into a field that was formatted for numbers. The second situation refers to incomplete record keeping, such as when the "sales" table refers to customers not listed in the "customers" table. The third reason for dirty data occurs when the same data is entered differently, such as when one table lists a state as "Texas" and a second table says "TX."

Data Warehousing Gets Personal
A couple key points about the future of data warehousing to consider involve the issues of geography and privacy. According to Prescott, "All future data warehouses will contain geographic information thanks to GPS (global positioning satellites) and cell phones. Then businesses will not only know who their customers are but where they are." This will open new marketing opportunities (e.g. personalized, wireless messaging) as well as privacy concerns.

Where is the line that separates tracking your customers versus stalking your customers? The definitions we give today may not be the same as tomorrow. Until we can come to a consensus on this matter, we cannot even begin to tackle these issues as well as other issues, such as why is a guinea pig neither a pig nor from Guinea?

Questions about this article? E-mail the author at