Data Warehousing
The variety of ways in which
data warehousing is used and implemented currently makes it difficult to
come up with a standard definition that is specific enough to assist in
making architectural decisions. For the sake of providing a general
definition of data warehousing that can be used as a basis for further
discussion, PricewaterhouseCoopers uses the following definition:
‘Data Warehousing
is a program dedicated to the delivery of information, which advances decision
making, improves business practices and enables knowledge workers.’
This definition clearly indicates
the functional role that data warehousing plays within an organization
as an analytical tool. However, it does not provide more fundamental
characteristics or draw a clear border around what is or should be in a
data warehouse. Nor does it explain how that information should be
organized or why it needs to be different from on-line transaction processing
(OLTP) environments. To help make these distinctions, it is helpful to
use concepts from the classical definition of data warehousing. The fundamental
characteristics of a data warehouse are:
-
Subject Oriented: A data warehouse
is organized around high-level business groupings called subjects.
They do not have the same atomic entity focus as OLTP systems.
-
Integrated: The data in the
warehouse must be integrated and consistent. That is, if two different
source systems store conflicting data about entities, or attributes of
an entity, the differences need to be resolved during the process of transforming
the source data and loading it into the data warehouse.
-
Time Variant: One of the key
characteristics distinguishing warehouses from operational environments
is the currency of the data. Operational systems require real-time
views of the data. Data warehouse applications generally deal with
longer term, historical data. They can also provide access to a greater
volume of more detailed information, as required, over the longer time
period.
-
Non-Volatile: The content of
OLTP systems are, by their nature, continuously changing. Inserts,
deletes, and updates form the basis of a large volume of business transactions
that result in a very volatile set of data. By contrast, data warehouses
are static. The data in the warehouse is read-only; updates or refresh
of the data occur on a periodic incremental or full refresh basis.
Business-Driven Architecture
Although there are many
debates among experts about data warehouse architectures, the ‘right’ architecture
for any organization must be defined by one prime criteria: does it support
the core business strategy. This must be the key component driving
the data warehouse strategy which in turn may impact a number of different
areas within an organization. The development of a business-driven
architecture and its trickle-down effects are illustrated in Figure 17.
Figure 17. Development
of a Business-Driven Architecture
Information Management
Hierarchy
For the NCAS data warehouse
strategy to be truly effective, it should map closely to the organization’s
overall information management strategy. The information management
requirements of the organization can be represented as a hierarchy, moving
from very tactical, operational requirements to a strategic focus at executive
levels. This hierarchy is shown in Figure 18.
Figure 18. Data Warehouse Hierarchy
A successful data warehouse strategy for NCAS must be able to provide
optimal solutions for each of the three upper levels in this hierarchy;
this generally forms the core of most data warehouse applications.
It must also indicate where within the overall application infrastructure
transactional reporting is to be supported to verify that all of an organization’s
information requirements can in fact be met by the proposed architecture.
The OSC is currently evaluating expansion of its datawarehouse/DSS to facilitate
more detail data in support of the warehouse strategy.