JCSCR
Journal of Computer
Science & Research
Journal of Computer Science & Research (JCSCR) - ISSN 2227-328X
http://www.jcscr.com
Vol. 1, No. 6, Pages. 12-17, December 2012
© The Authors
12
A Data Warehouse Design for
A Typical University Information System
Youssef Bassil
LACSC – Lebanese Association for Computational Sciences
Registered under No. 957, 2011, Beirut, Lebanon
Abstract
Presently, large enterprises rely on database systems to manage their data and information. These databases are useful for
conducting daily business transactions. However, the tight competition in the marketplace has led to the concept of data
mining in which data are analyzed to derive effective business strategies and discover better ways in carrying out business.
In order to perform data mining, regular databases must be converted into what so called informational databases also
known as data warehouse. This paper presents a design model for building data warehouse for a typical university
information system. It is based on transforming an operational database into an informational warehouse useful for decision
makers to conduct data analysis, predication, and forecasting. The proposed model is based on four stages of data
migration: Data extraction, data cleansing, data transforming, and data indexing and loading. The complete system is
implemented under MS Access 2010 and is meant to serve as a repository of data for data mining operations.
Keywords
Data Warehouse, DBMS, Data Mining, Information System
1. Introduction
Nowadays, almost every enterprise uses a database to store its vital data and information [1]. For instance,
dynamic websites, accounting information systems, payroll systems, stock management systems all rely on
internal databases as a container to store and manage their data. The competition in the marketplace has led
business managers and directors to seek a new way to increase their profit and market power, and that by
improving their decision making processes. In this sense, the idea of data warehouse and data mining was born
[2]. In fact, data warehousing is the process of collecting data from operational functional databases,
transforming, and then archiving them into special data repository called data warehouse with the goal of
producing accurate and timely management information [3]; whereas, data mining is the process of discovering
trends and patterns from data warehouse, useful to carry out data analysis [4].
A typical university often comprises a lot of subsystems crucial for its internal processes and operations.
Examples of such subsystems include the student registration system, the payroll system, the accounting system,
the course management system, the staff system, and many others. In essence, all these systems are connected to
many underlying distributed databases that are employed for every day transactions and processes. However,
universities rarely employ systems for handling data analysis, forecasting, prediction, and decision making. This
paper proposes a data warehouse design for a typical university information system whose role is to help in and
support decision making. The proposed design transforms the existing operational databases into an information
database or data warehouse by cleaning and scrubbing the existing operational data. Besides, several columns
and structures are dropped as they are useless for data mining. Finally, all captured and cleaned data are loaded
and indexed in the warehouse making them ready for conducting data mining tasks.
2. Background
Operational Database: An operational database is a regular database meant to run the business on a current basis
and support everyday transactions and processes [5].
Informational Database: An informational database is a special type of database that is designed to support
decision making based on historical point-in-time and prediction data for complex queries and data mining
applications. A data warehouse is an example of informational database.
Data Warehouse: It is a subject-oriented, integrated, time-variant, non-updatable collection of data used in
support of management decision-making processes. It is subject-oriented as it studies a specific subject such as
sales and customers behavior. It is integrated as it defines consistent naming conventions, formats, and encoding
structures from multiple data sources. It is time-variant as it studies trends and changes over time. It is non-
updatable as it is read-only, i.e. cannot be updated by regular users. [6]