Data Warehouse
Description
A classic data warehouse is defined in terms of an architecture that supports analysis of historical data. The data warehouse is oriented around subjects rather than processes. It is integrated across subject areas. The information it stores is not over-written with each update and therefore point-in-time data remains available across time. A simple definition we often use comes from Ralph Kimball -- "A data warehouse is a copy of transaction data specifically structured for query and reporting." (Page 310, The Data Warehouse Toolkit)
Virginia Tech is using Ralph Kimball's Star Schema methodology to build an enterprise data warehouse. The various subject areas constitute data marts that are being developed in an iterative fashion, one data mart at a time (Human Resources, Finance, and Student). The data marts are integrated and joined together through use of shared tables called "conformed dimensions". At every point where two or more data marts use the same dimension, they are conformed to have the exact same meaning and express data at the lowest level of granularity common to the sharing data marts. The Virginia Tech enterprise data warehouse is updated nightly. It is being used for trend analysis and also for a variety of operational reports that do not require up to the minute (or real-time) data. This is possible because we include detail data in the warehouse, as well as aggregate data.
The warehouse represents the business view of the data. Some detail data that are primarily relevant to transaction tracking are sometimes not moved into the warehouse or they may be hidden from user views because they do not have particular analytical value or pertinence to the business outcome. By the same token, the warehouse sometimes contains additional “new” information not otherwise available from the transaction processing system like calculated or summarized fields that specifically address routine business questions with a consistent reliable answer.
|