REAL-D: A Schema for Data Warehouses

Journal of Information Systems

Vol. 13, No. 1

Spring 1999

pp. 49–62

Daniel E. O’Leary

University of Southern California

ABSTRACT: This article integrates McCarthy’s REA (Resources-Events-Agents) model

and the closely related REAL (Resources Events Agents Locations) model with general

capabilities and requirements of data warehouses. REA/REAL contribute a theory for cap-

turing information about events and a focus on control relationships. Data warehouses bring

time-period information and a focus on information facilitating the creation of value. Using

aspects from both camps, a hybrid schema is developed called REAL-D, REAL for Data

warehouses. Existing data warehouse approaches lack theory, while REA/REAL are theory

based. Unique demands on data warehouses, however, make additional requirements on

REA/REAL, including (1) addition of time period as another dimension in order to allow

rollups from hour to day to week to month to year, (2) addition of location to facilitate

rollups from office to city to district, (3) change from a pure location dimension to a

nonhomogeneous dimension that allows rollup from person to office, and (4) change of

relationships of agents from one of control to a marketing-oriented one.

Key Words: REA, REAL, REAL-D, Databases, Data warehouses.

I. INTRODUCTION

his article argues that REA/REAL’s theoretical model of resources, events, agents and locations

provides a basis for a data warehouse

schema but that the unique requirements of data ware-

houses must be accounted for in order to meet the demands placed on data warehouses. This

article merges contributions from the REA/REAL schema and the data warehouse schema as a basis for

generating a revised schema for data warehouses, referred to as REAL-D.

REA/REAL and Data Warehouses

Databases conforming to McCarthy’s (1979, 1982) REA model and Denna et al. (1993) and

Hollander et al.’s (1996) REAL model are designed to capture and parsimoniously preserve event

information for a particular time period, typically a year or two. REA/REAL provide a theory for the

information that needs to be captured about economic events. Consequently, the REA/REAL design

ensures that periodic decision-making information is captured. REA is designed to provide an online

event processing (OLEP) capability for accounting and other data, expanding on online transaction

processing (OLTP). OLEP systems, such as REA/REAL databases, are updated with every transaction

URLs for data warehousing are listed in the appendix.

An earlier version of this article was presented at the Semantic Modeling of Accounting Phenomena (SMAP) Workshop,

August 1997, Dallas, Texas. The author wishes to acknowledge the comments of the participants at that workshop and the

substantial comments of the referees on two earlier versions of this article. Accepted by the previous editor, A. Faye Borthick.

50 Journal of Information Systems, Spring 1999

that occurs in an organization. OLEP systems make changes in their enterprise databases with each

event. OLEP systems differ from OLTP systems by capturing events not normally captured in transac-

tion processing systems (e.g., sales call events) and by including information not typically captured in

transaction processing systems (e.g., agent and location information).

Since about 1995, specialized databases known as data warehouses have been developed to deliver

data to users for analysis of trends over time periods, products, locations, and other dimensions with the

capability of online analytic processing capability

(OLAP). OLAP software provides the ability to

access analytical data, such as time series, so that users may analyze the data themselves. To support

OLAP, data warehouses, integrated databases designed to support decision making, provide large amounts

of data to decision makers. Because they allow decision makers to query the database about customer,

vendor and supplier behavior over time periods, data warehouses have been used by companies to help

them compete (Appleton 1996) and thus create value (e.g., Darling 1996).

REA provides a theoretical model for capturing event information, while data warehouses provide

a model for delivery of the data to users. Capture and delivery of event information are related because

only captured data can be delivered. OLTP/OLEP databases typically form the base of data ware-

houses, providing snapshots of an enterprise’s events on a daily, weekly, monthly, or yearly basis

(Edelstein 1995). With updates occurring as frequently as daily, there is little time to provide OLTP/

OLEP with increased content information. The same type of information however, can be provided for

multiple periods. Therefore, the design of the OLTP/OLEP database is critical to the information that is

captured for OLAP databases.

REAL-D

Data warehouses, however, have data requirements that are beyond those of the event information

captured by REA/REAL. Data warehouses store multiple snapshots of transaction and/or event databases

over multiple time periods and require the ability to drill down or up. Further, data warehouses are frequently

designed to deliver information useful for marketing and other value-creating opportunities. As a result, a

data warehouse schema must allow (1) the addition of time period as a dimension in order to allow rollups

from hour to day to week to month to year, (2) the change from a pure location dimension to a non- homoge-

neous dimension that allows rollup from person to office (known as cumulating nonhomogeneity), and (3) a

change of relationships of agents from one of control to a marketing-oriented one.

As a result, this article proposes integrating aspects of REA/REAL and data warehouses in order to

establish a theory-based approach for data warehouse schema, REAL for Data warehouses (REAL-D),

depicted in figure 1.

Motivation for New Data Warehouse Schema

The need for new schema for data warehouses is illustrated by a fatally flawed data warehouse

project (Paul 1997), whose failure was attributed to difficulties in determining user requirements for a

business model. Developers did not have any particular theory for generating data warehouse system

requirements, users complained that interviews took too long, and potential users each had different

views of the data, which began to obscure the business model. In the absence of a theoretical model to

guide schema design, appropriate event information was not identified. Although REA/REAL provides

a theoretical basis that ensures that event information is captured, REA/REAL do not address the fol-

lowing data warehousing issues:

• Users. In OLTP/OLEP a programmer typically generates a query that will be used thousands of

times, whereas in data warehousing the user formulates and submits queries, possibly only once.

• Content Needs. Where OLTP/OLEP are designed for parsimony, data warehouses often contain

aggregated information.

• Data in Database. OLTP and OLEP are typically designed to capture data for a firm for one or two

years, but OLAP is designed to provide users with that same type of information over multiple

URLs for OLAP are listed in the appendix.

O’Leary—REAL-D: A Schema for Data Warehouses 51

FIGURE 1

Contributions of REA/REAL and Data Warehouses to REAL-D

REA/REAL

••

• Theory: Resources, Events, Agents Location

••

• Control Relationships

Data Warehouses

••

• Time Period Information

••

• Cumulating Homogeneity in Dimensions

••

• Marketing Relationships

Capture Event Information

Date Delivery to Create Value

REAL-D

periods, such as three to five or more years.

• System. Although data warehouse users might be able to access usable data through views, it is

highly likely that a user’s ill-framed queries will grind the database to a halt if the views are

derived from a transaction-oriented database (SAS date).

Section II of this article summarizes the REA/REAL model, and section III reviews basic concepts

of data warehouse models. Section IV investigates multidimensional modeling using both star and snow-

flake schemas in the data warehouse setting, and section V relates the star and snowflake schemas to the

REA/REAL schema and discusses some of the differences. Section VI summarizes the contributions

and discusses potential extensions.

II. REA AND REAL MODELS

From an analysis of traditional account structures, McCarthy (1982) developed a generalized

E-R (entity-relationship) model of accounting phenomena called the Resources, Events, and Agents

(REA) accounting model. Agents are “persons and agencies who participate in the economic events

of the enterprise or who are responsible for subordinates’ participation” (McCarthy 1982, 563.)

Economic units are “a subset of economic agents. Units are inside participants: agents who work

for or are part of the enterprise being accounted for” (McCarthy 1982, 563.) Figure 2 illustrates the

REA model.

Denna et al. (1993) made the existence of the unit explicit by adding location, changing REA to

REAL. The rationale was that if information about location is important, then unless it is derivable, it

should be captured explicitly. Hollander et al. (1996) also note that location can be important for regu-

latory and other purposes. In addition, Hollander et al. (1996) give a location example where stores are

given a unique identifier and information about the store is kept in a store table. Further, Hollander et al.

(1996) differentiate between the internal and external agent. The resulting REAL model is illustrated in

figure 3. This format is used here because diamonds are not used to capture relationships in the REAL

or data warehouse literatures.

52 Journal of Information Systems, Spring 1999

FIGURE 2

REA Model

Economic

Resource

Stockflow

Economic

Event

Duality

Economic

Agent

Control

Economic

Unit

Responsibility

FIGURE 3

REAL Model

Source: McCarthy (1982)

Economic

Resource

Economic

Event

Location

External

Economic

Agent

Internal

Economic

Agent

O’Leary—REAL-D: A Schema for Data Warehouses 53

In addition, time period is an important part of the REA/REAL model. For example, when an event

occurred and how long it took are captured in an REAL database (Hollander et al. 1996). Because time

period is unique, Hollander et al. (1996) treat absolute time as an attribute. Subsequent research on

REA/REAL has yielded an illustrative accounting system (Gal and McCarthy 1986) and REA for deci-

sion support (Denna and McCarthy 1987). In each case, the REA schema was developed to capture

accounting and other transaction data in a manner that limited redundancy. When derivable, informa-

tion such as location is not an explicit part of the REA schema.

Because REA is a theoretical ideal, there will be a range of implementation compromises necessitated

by adoption of REA to various platforms. McCarthy and Rockwell (1989) noted examples of REA com-

promises that are a consequence of the implementation challenges: temporal aggregation of event histo-

ries, representation and use of a subset or super set of entities, and non-implementation of entity sets.

Because of these kinds of implementation compromises, the data warehouse may not be REA/

REAL. It may not be optimal to include all customer data or all financial data in a data warehouse

because it might not meet user needs (Lambert 1995). In addition, there can be extensions to the theo-

retical REA/REAL frameworks so that they meet different needs of data warehouse settings. Unique

demands of data warehouses establish additional functions beyond those in the REA/REAL data schema,

such as inclusion of a model of time periods in REAL-D.

III. DATA WAREHOUSES AND DATA MARTS

A data warehouse is “a single place located across a corporation’s networks where any user can get the

latest data, efficiently organized” (Radding 1995, 57). A data warehouse is a(n):

1. subject-oriented (e.g., customer, vendor, product, etc.),

2. integrated (integration shows up in many different ways—in consistent naming conventions, in

consistent measurement of variables, in consistent encoding structures, in consistent physical

attributes of data, and so forth),

3. time-variant (over some time horizon), and

4. nonvolatile (not subject to update problems)

collection of data in support of management decision making (Inmon 1995). A data warehouse under-

goes two processes: loading and access of data. Compared to transaction processing, there is no general

update of data. As a result, liberties can be taken to optimize the access of data at the physical level of

design (Inmon 1995).

Data marts are small data warehouses with specific business functions. For example, data marts

have been used to facilitate measuring the impact of marketing promotions. Because data marts are

developed for specific business purposes, system design, implementation, testing and installation are

less costly than for data warehouses. Where data warehouses can take years to develop, data marts can

be developed in a few months, at a much smaller cost.

Organizations use data warehouses for several objectives: to avoid slowing the performance of opera-

tional databases, to accumulate data over extended periods of time to allow analysis of trends, and to

answer particular queries, such as how many units were sold in a month (Bull and Richman 1995). How-

ever, the most important reason is that they can be used to improve profits (e.g., Public Accounting Report

1997). Using data warehouses, firms can micro-manage product lines across time period and location,

firms can create knowledge from data, and firms can perform a range of other activities. In addition, data

warehouses have helped firms identify their most profitable customers (Public Accounting Report 1997).

In spite of the large size of data warehouses,

users expect rapid response to queries. For example,

a large regional retailer expects decision support responses within two minutes (Baer 1996), while

Some reported sizes for data warehouses are Holiday Inn’s 165 gigabyte warehouse for two years of historical data (DePompa

1996), Fingerhut Companies’ 300 gigabyte database (Baer 1996), Chase Manhattan Bank’s 560 gigabyte data warehouse

(Baer 1996), MasterCard OnLine’s 1.2 terabyte database (Foley 1996), and Sears’ plan for a 2-terabyte data warehouse on 90

million households and 31 million Sears Card Holders (Foley 1996). Sears provides 121 weeks of daily sales history for each

item by location to allow analysis by both time period and location (Greenberg 1996).

54 Journal of Information Systems, Spring 1999

others do not want users to wait more than a few seconds. Consequently, the design of data warehouses

must facilitate rapid query response. Accordingly, data warehouse systems are designed to respond to

queries through multidimensional modeling and by using the star and snowflake schema.

IV. MULTIDIMENSIONAL MODELING: STAR AND SNOWFLAKE SCHEMA

Multidimensional modeling (MDM), developed for online analytic processing, is “a technique for

conceptualizing business models as a set of measures described by ordinary facets of business...based on

facts, dimensions hierarchies and scarcity...designed around numeric data, such as values, counts, weights

and occurrences” (Raden 1996a, 60). Data warehouses provide multidimensionality so that users can drill

down to more detail in their queries. Multidimensional databases meet the needs of users that need to see

data in a particular format, perform ad hoc analysis, and analyze the data using a range of techniques

(Sahin 1995).

The primary difference between relational and multidimensional databases is that multidimen-

sional databases store data in a format that facilitates users getting access to familiar data views and

provide users the ability to drill down into the data. In multidimensional databases there is a hypercube,

where, for each dimension on the hypercube, the user can drill down or aggregate. If the necessary data

were present, relational databases could support virtual access to data views, but “[f]or these types of

actions, a multidimensional database is both easier and faster to use than a relational database” (Sahin

1995, p. 2). A typical MDM problem statement is “What is the profitability by customer over time and

organization?” (Raden 1996a, 61). Rather than focussing on transaction processing, MDM aims at

answering managerial questions using the data. MDM capabilities are incorporated into relational data-

bases, such as Oracle (e.g., Radding 1995). Schema design facilitates MDM.

The star schema employs two types of tables, fact tables at the center of the star and dimension

tables at the points of the star. Fact tables capture event information and foreign key references to

FIGURE 4

Star Schema

Based on Raden (1996a, 61).

Product Dimension

Product Key

Product Description

Manufacturer

Brand, etc.

Market Dimension

Market Key

Market Description

Region

State

District

City

Customer

Fact Table

Market Key

Product Key

Period Key

Dollars

Units

Price

Time Dimension

Period Key

Period Description

Year

Quarter

Month

Day

Current Flag

Resolution

Sequence

O’Leary—REAL-D: A Schema for Data Warehouses 55

dimension tables containing detail information. Dimension tables are kept on each dimension that deci-

sion makers would like to either rollup or drill down. In terms of REA/REAL, the event table is at the

center of the star, and tables relating to the resources, agents, and locations are points of the star. The

event table includes the foreign key field for each of the points on the star. Data warehouses can provide

aggregated information or artifact-based information in star schema. For example, a data warehouse

FIGURE 5

Turning a Star Schema (figure 2) into a Snowflake Schema

Manufacturer

Manufacturer Key

Manufacture

Description

Size

Type, etc.

Product Dimension

Product Key

Product Description

Manufacturer

Brand, etc.

Brand

Brand Key

Brand Description

Brand Mgr.

Legacy Brand, etc.

...to fact (event) table

could facilitate analysis/aging of accounts receivable. Each receivable would be captured in the system

allowing analysis across dimensions of time period, customer and location (e.g., salesperson, office and

district). An example of the star schema, which appears throughout the data warehousing literature

(e.g., Peterson 1995; Meredith and Khader 1996) is shown in figure 4.

Based on Raden (1996a).

FIGURE 6a

Star Schema

Store Dimension

Store Key

Store Description

Store Manager

City

State

District ID

District Description

Region ID

Region Description

Region Manager

Fact Table

Store Key

Product Key

Period Key

Dollars

Units

Price

Time Dimension

Period Key

Period Description

Year

Quarter

Month

Day

Product Dimension

Product Key

etc.

Based on Raden (1996b, 43).

56 Journal of Information Systems, Spring 1999

When a database includes a large number of dimensions (categories) and dimensions are large

(high cardinality), Raden (1996a) recommends using the snowflake schema, which creates tables for

attributes within a dimension table. A snowflake is similar to a star except that foreign keys can be

embedded in the dimensions so that dimension tables have relationships with other dimension tables.

An example snowflake model extension to figure 4 appears in figure 5.

Another example of a star is given in figure 6a, and sample data for that file are provided in figure

6b. Data warehouses need information that explicitly captures all dimensions that users wish to summa-

rize. For example, quarter is derivable from date, but data warehouses would typically include quarter

so that user requests for quarter data could be answered rapidly.

Use of the star or snowflake schemas is aimed at limiting access and query problems in a data

warehouse environment. As Raden (1996a) noted, dimensional modeling for relational database design

exploits creation of long thin fact tables containing numeric information and relatively short and wide

dimension tables. With this approach, a number of queries can be resolved without reading the fact

table, thus avoiding time-consuming table scans and complex joins.

The star schema occurs often in commercial software. For example, one of Red Brick Warehouse’s

distinctive features is its ability to join star schema data rapidly in response to queries (Edelstein 1995).

Bontempo and Saracco (1997) discuss approaches to improve processing speed for star schema using

bitmaps. Because of the read nature of data warehousing, overhead can be reduced with a sufficient

amount of write cache (Raden and Peterson 1997). Star schema can exhibit unanticipated overhead. For

example, precalculated data generated in anticipation of particular queries for star and snowflake schema

add to the complexity of the data warehouse and result in more indexes (and associated reindexing)

(Watterson 1997). Aggregated tables and indexes to the fact and aggregate fact tables consume substan-

tially more space than raw data (Greenfield 1996).

V. STAR AND SNOWFLAKE MODELS AND REA/REAL

Schema Similarities and Differences

The star and snowflake models in data warehouses are structurally similar to REA/REAL mod-

els. Because data warehouses focus on anticipating queries on resources and agents, data warehouse

star and snowflake schemas include information on these dimensions, just as in REA/REAL. Re-

sources in REA/REAL are also resources in the star and snowflake models used for data warehousing

FIGURE 6b

Sample Star Schema Data

Some Dimension Tables

Store Key Description City State District Region

MM21 Mini Mall #21 01 05 01 01

MM22 Mini Mall #22 01 05 01 01

Product Key Product Description

aa2143 Wrist Band

Period Key Month Date Quarter Year

010198 01 01 01 1998

010298 01 02 01 1998

Event/Fact Table

Store Key Product Key Period Key Dollars Units Price

MM21 aa2143 010198 100 5 20

O’Leary—REAL-D: A Schema for Data Warehouses 57

and MDM. For example, in figure 4, the product dimension is a resource. Agents in the REA are

also agents in MDM.

Data warehouses require unique and different information than the REA/REAL schema. First, time

period is captured as a dimension rather than in a single attribute for time as in REA/REAL models

(McCarthy 1979, 1980, 1982; Denna et al. 1993; Hollander et al. 1996). Second, a wider range of

information about location is captured, e.g., store, city, region and district. Third, some dimensions are

not homogeneous as they are in REA/REAL models, in that agents are mixed with locations (e.g.,

figure 4). Fourth, the concern of the REA model with economic unit was subordinated to agent and was

one of control, whereas the primary concern of data warehouses is one of marketing information. The

remainder of this section addresses each of these differences.

Time Period as a Dimension

In figure 4, an entire dimension defines time periods. The existence of this data as a dimension does

not necessarily increase data entry demands. For example, some of data in the dimensions will be

generated automatically such as calendar conversions for day to week to month to quarter to year,

which lets rollups be automatic (Raden 1996a). In addition, special-case time ranges, including promo-

tion periods and seasons, can be built into the database. As a result, REA-based databases could form

the basis of replicated databases in data warehouses, where the replication automatically expands the

available data to a broader based schema as in figure 4.

Location as a Dimension

In data warehouses, location (e.g., market) is often a dimension. Although Denna et al. (1993) and

Hollander et al. (1996) suggest that if location can be inferred then information about it does not need to be

captured, the size of the data warehouse and the need for rapid query response argue for explicit rather

than inferred information to facilitate rollups, even if the information is redundant or derivable. Location

information can be generated automatically, in the same manner as time information. As a result, if store

numbers or register numbers are unique, they can be used to generate location information.

Homogeneity of Dimensions in REA and Data Warehouse Models

A schema table for a dimension (resource, event, agent, location) is defined as totally homoge-

neous if that table contains only information directly relating to that dimension (e.g., resources) and not

any other dimensions (e.g., agents). While the REA/REAL literature illustrates resources, agents, and

locations as totally homogeneous, under the data warehouse formulation the dimension tables are not

totally homogeneous according to the same criteria. For example, in figure 6a the dimension for store

key contains information about agents and locations. Why does the data warehouse version not main-

tain resource, agent, and location homogeneity? In these examples (e.g., Meredith and Khader 1996),

there is direct concern for being able to answer marketing queries directly (e.g., about the sales organi-

zation marketing representative, office, district and region). Therefore, the data warehouse design schema

concern is not with the homogeneity of agent or location information but instead with cumulating

nonhomogeneity, which, like time, cascades into definable categories. For example, the salesperson is

the lowest level in sales, all the sales personnel in a store have sales that accumulate to the store sales, all

the store sales accumulate to the district sales, all the district sales accumulate to all the region sales, etc.

Consequently, a dimension has cumulating nonhomogeneity if for some resource, agent, or location,

data are included for more than one dimension (such as sales personnel or store location).

Incorporating cumulating nonhomogeneity in the schema minimizes the number of physical joins

that must be made and provides a simpler schema (e.g., Raden 1996a). In the context of drill down for

location and time breakdowns for very large databases, minimizing physical joins can be a critical

design objective. The notion of cumulating nonhomogeneity can be extended to other dimensions re-

lated to other events such as purchasing.

On the one hand, employing cumulating nonhomogeneity could be viewed as an implementation

compromise because it intermixes information, e.g., agent and location information (McCarthy and

58 Journal of Information Systems, Spring 1999

Rockwell 1989). On the other hand, it could be argued that cumulating nonhomogeneity provides a

different perspective on the system design. Instead of focusing on the origin of the data, cumulating

nonhomogeneity focuses on the goal of providing answers to questions about organizational activities,

such as creating value. Development of similar models from different perspectives provides evidence

of the robustness of the REA/REAL approach. Cumulative nonhomogeneity in data warehouses typi-

cally results from mixing either external agents (figure 4) and locations or internal agents and locations

(figure 6a). The mixing that occurs depends on the type of questions the data warehouse is designed to

answer.

Control vs. Marketing Questions

In McCarthy (1982), economic unit is related to the event through the relationship control. Because

data warehouses support a range of queries related to creating value (e.g., Appleton 1996), there is

likely to be concern for additional types of relationships, e.g., marketing. Sales and customer informa-

tion is the most important data in many data warehouses. For example, Sears chose data warehousing

“to be the single most authoritative source for sales information” (Greenberg 1996, 66). Holiday Inn’s

data warehouse is used “to look at stay patterns of our customers...who stayed with us, when for how

long [and] how they paid” (DePompa 1996). Another use is to identify the most profitable customers

(Public Accounting Report 1997).

Although transaction-gathering schema and accounting information systems provide substantial

attention to control issues, data warehouses are concerned with creating value and generating profits.

As a result, because the transactions have already been through the control system, there is less interest

in control and more interest in analysis of sales data in order to improve profits and create value.

Economic

Resource

Dimension

Economic

Event

Fact Table

Location

Dimension

for Roll-up

External

Agent

Dimension

Internal

Agent

Dimension

FIGURE 7

REAL-D Model

Time Period

Dimension

for Roll-up

O’Leary—REAL-D: A Schema for Data Warehouses 59

FIGURE 8

Sample Star Schema

REAL-D

Resource Dimension

Product Key

Unit Price

Description

...

Sales Event Facts

Customer Key

Store Key

Salesperson

Product Key

Time Key

Quantity Sold

...

Internal Agent Dimension

Customer Key

Customer Name

...

Locating/Marketing Dimension

Store Key

Store Manager

City

...

Time Dimension

Time Key

Month

Year

...

External Agent Dimension

Salesperson Key

Salesperson

...

FIGURE 9

Sample Snowflake Schema

for REAL-D

Resource Dimension

Product Key

Unit Price

Description

Manufacturer

...

Sales Event Facts

Customer Key

Store Key

Salesperson

Product Key

Time Key

Quantity Sold...

Internal Agent Dimension

Customer Key

Customer Name

...

Location Dimension

Store Key

Store Manager

City

...

Time Dimension

Time Key

Month

Year...

External Agent Dimension

Salesperson Key

Salesperson Name

...

Month Key

Month Name...

Manufacturer Key

...

60 Journal of Information Systems, Spring 1999

Accordingly, designers focus on functional needs, changing the focus of data warehouses to market

information.

A Data Warehouse Model

REAL-D, the revised model, can be represented with a star or a snowflake schema. At the center of

the star is the event table, capturing the critical information about the event (the fact table). Surrounding

the event are the dimensions of resources, agents, location, and time period as they relate to the event,

resulting in a star schema. A general example appears in figure 7; a particular example, in figure 8. In

some situations, however, additional tables will need to be generated from some of the dimensions,

resulting in a snowflake schema. In the example in figure 9, additional tables are included to provide

information about time period and manufacturer. As in REA and REAL models, the particular process

being modeled influences which resources, events, agents, and locations are included and the number

of tables used to represent each. Additional information changes the star model into a snowflake model.

Control information can also be added with links from agents to economic unit, or agent information

can be linked to resource information.

Implementation

Data warehouse star schema systems can be implemented from REA/REAL databases by mapping

the original database into a star schema using SQL. Alternatively, some tools have been designed to help

developers generate a hypercube based on a relational database (DePompa 1997). At the center of the star

would be the event table, which holds the particular measures of the event. Then the points of the star

would be tables containing the resources, agent, and location dimensions. Further, the administrator would

add any necessary computed columns for aggregations or unique OLAP use. In addition, the database

administrator would also create the necessary timetable with the appropriate date hierarchy. From an

efficiency standpoint, it is often helpful to order the data by the most prevalent use, which minimizes

sorting to respond to queries (Petersen 1994). As part of the implementation, any compromises to the

original REA/REAL structure, such as nonrepresentation of any entity sets, would be generated.

VI. DISCUSSION

This article integrated REA/REAL models developed for capturing event information and data

warehouse requirements for decision making in an integrated schema called REAL-D. The resulting

schema incorporates explicit time period and location information into the event model of REA/REAL.

This research has expanded REA/REAL to model data warehouses, a setting not previously investi-

gated by REA/REAL researchers.

Dunn and McCarthy (1997, 46) suggested that “productive extensions in REA research could

include…use of REA to explicate instantiations.” New developments in data warehouses could be used to

generate additional extensions to the REA/REAL model. First, as noted above, the sales event is not the

only event likely to have different data warehouse dimensions. The purchase event has the same structure.

For example, firms purchase from a set of sales people that work for the same division of a firm that is part

of a larger organization. The purchases accumulate to purchases from a division, which accumulate to

purchases from a company. This is important information because a decision maker can use it to generate

purchase information for negotiating quantity discounts from sellers. Second, a behavioral study of data

warehouse designers could be run to determine whether designers trained in REAL-D outperform those

without training. Providing researchers a template for analyzing data warehouse design is likely to im-

prove their results. Third, a study could compare data warehouses for their similarity to REAL-D.

O’Leary—REAL-D: A Schema for Data Warehouses 61

APPENDIX

Some Useful URLs for Data Warehousing Schema

Content Uniform Resource Locator (URL)

CIO Data Warehousing Links http://www.cio.com/CIO/rc_dw.html

Data Warehousing Information http://pwp.starnetinc.com/larryg/index.html

Article List http://pwp.starnetinc.com/larryg/articles.html

White Article List http://pwp.starnetinc.com/larryg/whitepap.html

Data Warehousing Institute http://www.dw-institute.com/

Best of Database: Programming http://www.dbpd.com/vault/index.shtml

and Design

Association for Computing http://www.acm.org/sigmod/

Machinery’s Special Interest

Group on Management of Data

Stanford Data Warehousing http://www-db.stanford.edu/warehousing/

Publications publications.html

Content Uniform Resource Locator (URL)

Custom Net http://www.custom-net.com/dw.htm

Lycos Search for Data Warehouse http://query6.lycos.cs.cmu.edu/cgi-

bin/pursuit?query=data+wareh

Some Useful URLs for OLAP

Content Uniform Resource Locator (URL)

OLAP Resources http://www.sgroves.demon.co.uk/olaplnks.htm

Lycos Search on OLAP http://lycos11.lycos.cs.cmu.edu/cgi-bin/pursuit?query=olap

Maintenance and control of http://www.xs4all.nl/~fab/olapkeep.html

OLAP applications

Relational OLAP Systems http://www.cas.american.edu/~ghadsal/DW/OLAP/sld003.htm

OLAP and the Web http://www.dbmsmag.com/9701i08.html

OLAP, OLTP, MDDB http://www.datamation.com/PlugIn/issues/1996/april15/04beval1.html

REFERENCES

Appleton, E. 1996. Use your data warehouse to compete. Datamation (May 15): 34–38.

Baer, T. 1996. Relational technology in the land of the giants. Software Magazine (February): 61–64.

Bontempo, C., and C. Saracco. 1997. Accelerating indexed searching. Database Programming and Design

Online. <http://www.dbpd.com/vault/bontempo.htm>.

Bull, K., and D. Richman. 1995. Oracle’s single vendor solution. Information Week (July 10): 72.

Darling, C. 1996. How to integrate your data warehouse. Datamation (May 15) 40–52.

Denna, E., J. Cherrington, D. Andros, and A. Hollander. 1993. Event Driven Business Solutions. Homewood, IL:

Irwin.

———, and W. McCarthy. 1987. An events accounting foundation for DSS. In Decision Support Systems: Theory

and Application, edited by C. Holsapple, and A. Whinston, 239–263. Berlin, Germany: Springer Verlag.

DePompa, B. 1996. Stack that data. Information Week (January 29): 50–57.

———. 1997. How to sort through the OLAP maze. DW for Data Warehousing (May).

Dunn, C., and W. McCarthy. 1997. The REA accounting model: Intellectual heritage and prospects for progress.

Journal of Information Systems (Spring): 31–51.

62 Journal of Information Systems, Spring 1999

Edelstein, H. 1995. Technology analysis: Faster data warehouses: New tools provide high-performance querying

through advanced indexing. Information Week (December 4): <http://www.techweb.com/se/

directlink.cgi?IWK19951204S0045.htm>.

Foley, J. 1996. Data dilemma. Information Week (June 10): 14–16.

Gal, G., and W. McCarthy. 1986. Operation of a relational accounting system. Advances in Accounting: 83–112.

Greenberg, I. 1996. Data warehouse initiative helps Sears weather competition. Infoworld (July 29): 66.

Greenfield, L. 1996. Don’t let data warehousing gotchas getcha. Datamation (March 1): <http://www.datamation.com/

PlugIn/issues/1996/march1/03asoft1.html>.

Hollander, A., E. Denna, and J. Cherrington. 1996. Accounting, Information Technology, and Business Solu-

tions. Chicago, IL: Irwin.

Inmon, W. 1995. What is a Data Warehouse? <http://www.cait.wustl.edu/cait/papers/prism/vol1_no1/>.

Lambert, B. 1995. Break old habits to define data warehousing requirements. Data Management Review (December):

<http://www.data-warehouse.com/resource/articles/lamber11.htm>.

McCarthy, W. 1979. An entity view of accounting models. The Accounting Review (October): 667–685.

———. 1980. Construction and use of integrated accounting systems with entity relationship modeling. In Entity-

Relationship Approach to Systems Analysis and Design, edited by P. Chen, 625–650. New York, NY: North-

Holland.

———. 1982. The REA accounting model: A generalized framework for accounting systems in a shared data

environment. The Accounting Review (July): 554–578.

———, and S. Rockwell. 1989. The integrated use of first-order theories, reconstructive expertise and implemen-

tation heuristics in an accounting information system design tool. Proceedings of the Ninth International

Workshop on Expert Systems and their Applications, Avignon, France, EC2.

Meredith, M., and A. Khader. 1996. Divide and aggregate: Designing large data warehouses. Database Program-

ming and Design: The Online Edition (June): <http://www.dbpd.com/vault/archives.shtml>; <http://

www.dbpd.com/vault/96index.htm>.

Paul, L. 1997. Anatomy of a failure. CIO Magazine (November 15): <http://www.cio.com/archive/enterprise/

111597_data.html>.

Petersen. 1994.

Peterson, S. 1995. Stars: A pattern language for query optimized schema. <http://c2.com/ppr/stars.html>.

Public Accounting Report. 1997. KPMG jumps in data warehousing ring. Public Accounting Report (January): 1,

Radding, A. 1995. Building a better data warehouse. Infoworld (November 20): 1, 57–62.

Raden, N. 1996a. Modeling a data warehouse. Information Week (January 29): 60–66.

———. 1996b. Maximizing your warehouse. Information Week (March 18): 42–48.

———, and Peterson. 1997.

Sahin, K. 1995. Multidimensional database technology and data warehousing. Database Journal (December).

SAS. (date). Data Warehousing: A Blueprint for Success. Cary, NC: SAS Institute.

Watterson, K. 1997. Parallel tracks. Datamation (May): <http://www.datamation.com/PlugIn/issues/1997/may/

05data.html>.