CIS 123 - Databases

Databases Overview

This week we explore databases and data warehouses. Databases are often critical to the daily operation of an organization, and are vital for monitoring and planning. The outline presented here is intended to help point out important topics and terms covered in the chapter. It is not intended to replace the lecture.

Objectives

  • Explain the differences between traditional file organization and the database approach
  • Enumerate the most important features and operations of the relational database model
  • Explain what common database terms mean
  • List the operations involved in transferring data from a transactional database into a data warehouse

Terminology

  • entity: any object about which information is collected
  • character: considered the smallest unit of data in a database (actually not accurate since numbers may be stored in a binary format)
  • field: one piece of information about an entity; usually consists of a group of characters or a number
  • record: a collection of fields related to the same entity
  • file: a collection of related records
  • table: a collection of related records
  • database: a collection of related files/tables
  • DBMS: database management system
  • DBA: database administrator
  • query: a request for specific information
  • SQL: structured query language
  • schema: a blueprint showing the structure of a database
  • data dictionary: a repository containing information about the data being stored and its organization; this may include where the data came from, tables that are related to it, field information (such as size and type of field), index information (to sort and speed access), processes that use the data, rules about how the data is added to and updated, and validation and consistency rules
  • metadata: another term for data dictionary
  • data modeling: analyzing an organization's data and identifying the relationships among the data
  • ERD: entity relationship diagram; a graphical representation of all entity relationships
  • index: overhead kept by the database to make accessing records faster and maintain sort orders; generally based on key fields
  • OLAP: online analytical processing; often used with data warehouses

Basic database concepts

  • traditional file approach to maintaining data: data is stored in regular files which requires special programs to be written to access it, and rewritten whenever any change is made to the structure of the file (such as adding a field); this approach often leads to data redundancy and errors; providing access to several simultaneous users is difficult and error prone
  • database approach to maintaining data: databases have descriptions of the data they contain, can do validity checking as data is added or changed, provide mechanisms for quickly accessing the data, often provide security measures, usually support multiple simultaneous users, reduce (or eliminate) the need for redundant data, and allow for changes to the structure of the database often without requiring any changes to the programs that use the database
  • data redundancy: having the same data stored in multiple places; this is usually considered bad practice since changes in one place must be immediately reflected in all the other places for the data to be kept accurate
  • data integrity: how accurate the data is

The relational database model

  • relational databases are the predominant model in use today
  • based on relational algebra
  • fields are called attributes
  • records are called tuples
  • tables are called relations
  • rules help keep ensure database integrity; process is called normalizing
  • records in different tables within a relational database can be related via key fields
  • key field: a field which identifies a record
  • composite key: a key made up from more than one field
  • primary key: a key which is unique within a table
  • foreign key: a key within a record that refers to a record in a different table
  • join table: the table that results from joining two related tables together
  • one-to-many relationship: when one record in a table may be related to multiple records through one key field; one example is that one author may write multiple books
  • many-to-many relationship: when the one-to-many relationship occurs in both directions; one example would be that one author may write multiple books, and one book may be written by multiple authors
  • relational operation: creates a temporary table that is a subset of the original table(s)
  • SQL: (structured query language) has become the de facto standard for accessing data within relational databases

The object-oriented database model

  • stores both data about an entity and procedures that manipulate that data
  • encapsulation: refers to having both data and procedures stored together
  • inheritance: refers to how objects created from simpler objects automatically inherit all the data and procedures of the simpler object

Data warehousing

  • data warehouse: a large database used to support management decision making
  • data warehouses often store transaction details long term (while actual transactional databases only store recent transaction details)
  • data mining: looking for trends, patterns, or other information in the data contained in databases
  • steps in moving data from a regular database to a data warehouse
    • extract from the current database the data that you want to put into the data warehouse
    • transform the extracted data into the form you want to store it in
    • check to make sure everything is accurate
    • merge/load the data into the data warehouse
    • perform additional checks to ensure accuracy
  • data marts: subsets of a data warehouse tailored to meet particular needs within an organization; removes some of the complexity of the larger data warehouse and may help increase security by limiting access to sensitive information
  • data redundancy is considered more acceptable in a data warehouse because that data is under tighter controls, existing data doesn't usually get modified, and it is sometimes very desirable for efficiency reasons