Magazine Article | June 1, 2002

Avoid Data Warehouse Disaster

Source: Field Technologies Magazine

To successfully implement a data warehouse project, follow these four guidelines ...

Integrated Solutions, June 2002

Just seeing the phrase "data warehouse" is enough to make IT directors and CIOs hide under their desks, and repeatedly recite "find a happy place" until the panic attack goes away. But, for those brave souls who desire to know more about this subject and choose to press on, there is great BI (business intelligence) potential ahead. By following the four steps to data warehouse success, you'll be better equipped to convince your CEO that a data warehouse project is worth the investment. Who knows, maybe there will even be enough money saved by the project to say, "Thanks for a job well done."

Avoid Scope Creep
Most vendors and analysts list "scope creep" as a top reason that data warehouses bite the dust. As the phrase suggests, scope creep occurs when a data warehouse deviates from its initial focus. For example, suppose a manufacturer starts its data warehouse project with the goal of bringing together databases across three departments such as sales, accounting, and purchasing. Ultimately, the manufacturer wants to produce six business intelligence reports that will help the three departments do their jobs better. Two weeks into the rollout, however, the marketing department requests to have another database tied into the data warehouse and two more reports created for them. Six months from the initial rollout date, after dozens of applications, databases, and reports have been added to the original project, the project manager discovers the data warehouse is twice as large as was originally expected. The inherent problems with this scenario are made clear by Mike Schroeck, partner in charge of global iAnalytics/DW practices, for PwC Consulting (New York). "When scope creep occurs, data warehouse projects take longer to implement, cost more to manage, and run a greater risk of letting sensitive information get into the wrong hands," says Schroeck. "If the scope creep happens before the project even gets off the ground, companies risk going over budget and never completing the job."

The data warehouse is a huge enough undertaking by itself without integrating extra data sources and creating reports that weren't in the original plans. Thus, IT is forced to walk a tight line between stiff-arming managers (thereby leaving these managers feeling that they will never be part of the data warehouse project) and giving in to every request that comes down the pike. While giving in to everyone's wishes can lead to disaster, consider what the opposite extreme can cause. "If people think that they will never be included in the data warehouse project, they may opt for the easiest immediate fix, which is to create their own mini data warehouse known as a data mart," says Schroeck. "This problem is as bad if not worse than the first problem." By allowing several data marts to exist in the company, you compound one of the original problems you had hoped to solve, which is unifying data enterprise wide. Inevitably what happens is that sales uses different data standards (e.g. customer's last name) for identifying customers than marketing (e.g. ID number). This scenario multiplied by hundreds of data sources and hundreds of data standards yields a very tangled web that can put a huge burden on the IT department.

The best way to avoid scope creep is to plan ahead for future growth and to share the plan with departments that are not part of the initial data warehouse rollout. Then, you can cite a target date for an inquirer's department and still hit your original completion deadline without going over budget.

Don't Manually Cleanse Or Transfer Data
After the scope of the project is set, the next step will be to migrate data from disparate data sources into a central repository. Four things have to occur to make this happen: the data has to be cleansed, extracted from the original data source, converted into a format usable by the data warehouse, and loaded into the new data repository. These tasks can be accomplished using manual processes or automation software. According to Jeff Jones, director of strategy for IBM Data Management Solutions at IBM's Software Group (White Plains, NY), using automation tools is almost always the better choice. "Automation tools enable data to be extracted, transformed, and loaded without human intervention," says Jones. "These tools present data in a high-performance, logical, and intuitive manner, which is necessary to perform analyses."

Before any movement of data occurs, however, users are admonished to clean up their data. For example, even though the accounting manager and purchasing manager know that "Bill Rock" and "William L. Rock" refer to the same customer, computers don't. Data cleansing software can be used to break data down into its most basic components (a process known as parsing), make sure that all fields include the type of data that's supposed to be in the field (this process is known as standardizing), and unite matched data fields into new records.

Clean, Warehoused Data Needs Business Intelligence
After the data is cleansed and extracted, transferred, and loaded to the data warehouse repository, the enterprise has a unified view of its data. While this has some benefit, it's not enough to justify spending millions of dollars. What does justify money spent on data warehousing projects is business intelligence. True business intelligence begins with data mining, which mines data deeper than standard queries in a database. "Data mining is the ultimate, highest analytics tool available today," says Jones. "It is more than performing exotic SQL queries or high-volume OLAP (online analytical processing); data mining requires very little user input to search for patterns of data within a database." Data mining can discover patterns that OLAP and SQL are not built to find. For example, looking across a customer database, data mining can reveal that all customers can be divided into 10 groupings based on attributes such as geography, money spent in last year, age, shopping habits, etc. Business intelligence takes this data and puts it into a report format that gives business analysts actionable information.

A real world example of business intelligence in action is seen in GE Capital Fleet Services (GE Capital), one of the largest fleet management services in the world. Prior to its BI initiative in 1999, GE Capital generated paper invoices, which were printed and mailed to fleet managers (GE Capital's customers). The invoices were hundreds of pages long, giving detailed information about each vehicle, including all service history information. Because of the vast size of the invoices, GE Capital was spending more than $1 million annually to print and mail documents. Additionally, customers found the documents very arduous to navigate through, which made it difficult for them to know if there were any concerns with a particular vehicle that they were considering leasing from GE Capital. GE Capital implemented a business intelligence solution that enabled it to keep its data in an electronic format. Customers can access vehicle information via a secure intranet and see a summary of the data (provided by the business intelligence tools), which they can navigate through using keyword searches. Gartner (Stamford, CT) Analyst Howard Dresner says that GE Capital can add further business intelligence functionality to its data warehouse in the future. "For instance, it could alert a driver that a particular vehicle needs an oil change and then offer the driver a coupon for the rapid oil change center at the next corner," Dresner says. In this example, GE Capital's data warehousing and business intelligence solution improved customer retention rates and saved more than $1 million on postage and printing.

Don't Forget About Security
Another important point to consider about data warehousing and integrating disparate databases is the issue of security. While it may have been a royal pain in everyone's rear to have critical data residing among various sources, it also meant IT and other department managers had to be involved whenever a special report was needed. The obvious benefit to this situation is better security. Much like our political process has its checks and balances, forcing employees to get assistance from management makes others in the organization aware of what is trying to be accomplished. On the other hand, if one person, such as a CSR (customer service representative) has a "360° view" of the customer, there is the potential for fraudulent use of data. For instance, by having full access to the data warehouse, the CSR doesn't just see the status of the customer's last order, he also sees the customer's credit card information, address, purchasing history, personal interests, birthday, and Social Security number. This information could be very dangerous in the wrong hands. That is why it's so important before implementing a data warehousing solution to remember to think about the security rules that will need to be put in place.

Not Enough Time For A Data Warehouse Implementation?
While most large-scale database integration initiatives will warrant the need for a data warehouse, business pressure often means that users want information sooner than is possible given the time it takes to go through the specify-design-build cycle for a full data warehouse. "A downside to data warehousing is that it takes time to get agreement on cross department data models," says Donald MacCormick, worldwide director of product communications for Crystal Decisions (Palo Alto, CA), a provider of reporting, analysis, and information delivery products and services. "For enterprises that have information requirements that cannot wait for the formal data warehouse to complete, it is possible to bring the data together in the reporting layer as an interim solution." MacCormick advises that this can avoid one of the big pitfalls of building a data warehouse without involving the users. "The worst place to begin a data warehousing project is by stating, 'We want to build a data warehouse,'" says MacCormick. "The first question should always be, 'What do our employees, customers, and partners need from the data?' Being able to show the users what the end result might look like without actually having to build a warehouse can be invaluable."