Databases
This week we explore databases and data warehouses.
Databases are often critical to the daily operation of an organization,
and are vital for monitoring and planning. The outline presented
here is intended to help point out important topics and terms covered in
the chapter. It is not intended to replace the lecture.
Objectives
- explain the differences between traditional file organization and the database approach
- enumerate the most important features and operations of the relational database model
- explain what common database terms mean
- list the operations involved in transferring data from a transactional database into a data warehouse
Terminology
- entity: any object about which information is collected
- character: considered the smallest unit of data in a database
(actually not accurate since numbers may be stored in a binary format)
- field: one piece of information about an entity; usually
consists of a group of characters or a number
- record: a collection of fields related to the same entity
- file: a collection of related records
- table: a collection of related records
- database: a collection of related files/tables
- DBMS: database management system
- DBA: database administrator
- query: a request for specific information
- SQL: structured query language
- schema: a blueprint showing the structure of a database
- data dictionary: a repository containing information about the data
being stored and its organization; this may include where the data
came from, tables that are related to it, field information (such as size
and type of field), index information (to sort and speed access), processes
that use the data, rules about how the data is added to and updated, and
validation and consistency rules
- metadata: another term for data dictionary
- data modeling: analyzing an organization's data and
identifying the relationships among the data
- ERD: entity relationship diagram; a graphical representation
of all entity relationships
- index: overhead kept by the database to make accessing records
faster and maintain sort orders; generally based on key fields
- OLAP: online analytical processing; often used with data warehouses
Basic database concepts
- traditional file approach to maintaining data: data is stored in regular files which
requires special programs to be written to access it, and rewritten whenever any change
is made to the structure of the file (such as adding a field); this approach often
leads to data redundancy and errors; providing access to several simultaneous users
is difficult and error prone
- database approach to maintaining data: databases have descriptions of the data they contain,
can do validity checking as data is added or changed, provide mechanisms for quickly
accessing the data, often provide security measures, usually support multiple simultaneous
users, reduce (or eliminate) the need for redundant data, and allow for changes to the
structure of the database often without requiring any changes to the programs that use
the database
- data redundancy: having the same data stored in multiple places; this is usually considered
bad practice since changes in one place must be immediately reflected in all the other
places for the data to be kept accurate
- data integrity: how accurate the data is
The relational database model
- relational databases are the predominant model in use today
- based on relational algebra
- fields are called attributes
- records are called tuples
- tables are called relations
- rules help keep ensure database integrity; process is called normalizing
- records in different tables within a relational database can be related via key
fields
- key field: a field which identifies a record
- composite key: a key made up from more than one field
- primary key: a key which is unique within a table
- foreign key: a key within a record that refers to a record in a different table
- join table: the table that results from joining two related tables together
- one-to-many relationship: when one record in a table may be related to multiple records
through one key field; one example is that one author may write multiple books
- many-to-many relationship: when the one-to-many relationship occurs in both directions;
one example would be that one author may write multiple books, and one book may be written
by multiple authors
- relational operation: creates a temporary table that is a subset of the original table(s)
- SQL: (structured query language) has become the de facto standard for accessing data within
relational databases
The object-oriented database model
- stores both data about an entity and procedures that manipulate that data
- encapsulation: refers to having both data and procedures stored together
- inheritance: refers to how objects created from simpler objects automatically
inherit all the data and procedures of the simpler object
Data warehousing
- data warehouse: a large database used to support management decision making
- data warehouses often store transaction details long term (while actual transactional
databases only store recent transaction details)
- data mining: looking for trends, patterns, or other information in the
data contained in databases
- steps in moving data from a regular database to a data warehouse
- extract from the current database the data that you want to put into the data warehouse
- transform the extracted data into the form you want to store it in
- check to make sure everything is accurate
- merge/load the data into the data warehouse
- perform additional checks to ensure accuracy
- data marts: subsets of a data warehouse tailored to meet particular needs within
an organization; removes some of the complexity of the larger data warehouse and
may help increase security by limiting access to sensitive information
- data redundancy is considered more acceptable in a data warehouse because that data
is under tighter controls, existing data doesn't usually get modified, and it is
sometimes very desirable for efficiency reasons