ITP-INF004 Data Warehousing Policy
2 | Page
Lake can include structured data from relational databases (rows and columns), semi-
structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs),
and binary data (images, audio, video).
Data Mart: A subset of the enterprise data warehouse that is designed for a particular
line of business, such as sales, marketing, or finance.
Data Mining: Data Mining is the process of sifting through large amounts of data to
produce data content relationships. It also refers to the technique by which a user
utilizes software tools to look for particular patterns or trends. This technique can
uncover future trends and behaviors, allowing businesses to make proactive,
knowledge-driven decisions. Often performed by leveraging Artificial Intelligence
or
Machine Learning.
Data Model: An abstract model that organizes elements of data and standardizes how
they relate to one another and to the properties of entities. Data Warehouse: A
storage architecture designed to hold data extracted from transaction systems,
operational data stores and external sources. The warehouse then combines that data
in an aggregate, summary form suitable for enterprisewide data analysis and reporting
for predefined business needs.
Data Warehousing: A process for building decision support systems and a knowledge-
based application environment in support of both everyday tactical decision making and
long-term business strategy. Data warehouses and data warehouse applications are
designed primarily to support the decision-making process by providing the decision
makers with access to accurate and consolidated information from a variety of sources.
Dimension Tables: Dimension Tables describe the entities, represented as
hierarchical, categorical information such as time, departments, locations, and
products. Dimension Tables are sometimes called lookup or reference tables.
Extract, Transform and Load (ETL): Refers to the methods involved in accessing and
manipulating source data and loading it into a data warehouse.
Fact: A value or measurement, which represents a Fact about the managed entity or
system. Facts, as reported by the reporting entity, are said to be at raw level;
Examples include sales, cost, and profit.
Fact Table: A table in a Star Schema that contains Facts. A Fact Table typically has
two types of columns: Those that contain Facts (e.g., numbers), and those that are
foreign keys to Dimension Tables. The primary key of a Fact Table is usually a
composite key constructed with all its foreign keys.
Online Analytical Processing (OLAP): A type of software used to perform rapid
multidimensional analysis on large volumes of data from a data warehouse or some
other centralized data store. This is accomplished by extracting data from multiple
relational data sets and reorganizing it into a multidimensional format that enables fast
processing.