Lecture Notes #8

Transcription

Lecture Notes #8
Databases and Information Management
Reading:
Laudon & Laudon
chapter 5
Additional Reading:
Brien & Marakas
chapter 3-4
COMP 5131
1
Outline
‰ Database Approach to Data Management
‰ Database Management Systems
‰ Improving Business Performance and
Decision Making
‰
‰
‰
Data Warehouse
Data Marts
Business Intelligence
‰ Managing Data Resources
COMP 5131
2
Database Approach
¾ Database
„
„
Collection of related files containing records on people, places, or
things
Prior to digital databases, business used file cabinets with paper
files
¾ Entity
„
„
Generalized category representing person, place, thing on which
we store and maintain information
Example → SUPPLIER, PART
¾ Attributes
„
Specific characteristics of each entity,example
Š SUPPLIER name, address
Š PART description, unit price, supplier
COMP 5131
3
Relational Database
¾Organize Data into 2D Tables
„
„
Tables → Relations with columns and rows
One table for each entity
Š Example → CUSTOMER, SUPPLIER, PART, SALES
„
Fields (columns) store data representing an attribute
Rows store data for separate records
Key field: Uniquely identifies each record
„
Primary key:
„
„
Š One field in each table
Š Cannot be duplicated
Š Provides unique identifier for all information in any row
COMP 5131
4
Relational Database
¾ Relational Database Table
A relational database organizes data in the form of two-dimensional tables. Illustrated here is a table for the
entity SUPPLIER showing how it represents the entity and its attributes. Supplier_Number is the key field.
COMP 5131
5
Relational Database
¾ Part Table
Data for the entity PART have their own separate table. Part_Number is the primary key and Supplier_Number is the foreign key,
enabling users to find related information from the SUPPLIER table about the supplier for each part.
COMP 5131
6
Relational Database
¾Establishing Relationships
„
Entity-relationship diagram
Š used to clarify table relationships in a relational database
„
Relational database tables may have:
Š One-to-one relationship
Š One-to-many relationship
Š Many-to-many relationship
„
Requires creating a table (join table, Intersection relation) that
links the two tables to join information
¾A Simple Entity Relationship Diagram
„
Relationship between supplier and Part
COMP 5131
7
Relational Database
¾ Sample Order Report
The shaded areas show which data came from the SUPPLIER, LINE_ITEM, and ORDER tables. The database does not
maintain data on Extended Price or Order Total because they can be derived from other data in the tables.
COMP 5131
8
Relational Database
¾ Final Database Design with Sample Records
The final design of the database for suppliers, parts, and orders has four tables. The LINE_ITEM table is a join table that eliminates the
many-to-many relationship between ORDER and PART.
COMP 5131
9
Relational Database
¾ Entity-Relationship Diagram for the Database with four Tables
This diagram shows the relationship between the entities SUPPLIER, ART, LINE_ITEM, and ORDER.
COMP 5131
10
Relational Database
¾Normalization
„
Process of streamlining complex groups of data to
„
„
„
Minimize redundant data elements
Minimize awkward many-to-many relationships
Increase stability and flexibility
¾Referential Entity Rules
„
Used by relational databases to ensure that
relationships between coupled tables remain consistent
Š Example → When one table has a foreign key that points to
another table, you may not add a record to the table with foreign
key unless there is a corresponding record in the linked table
COMP 5131
11
Database Management Systems
¾DBMS
„
Specific type of software for creating, storing,
organizing, and accessing data from a database
„
Separates the logical and physical views of the data
„
Logical view → How end users view data
„
„
Physical view → How data are actually structured and
organized
Examples of DBMS → Microsoft Access, DB2, Oracle
Database, Microsoft SQL Server, MySQL (Open Source)
COMP 5131
12
Database Management Systems
¾HRD Database with Multiple Views
„
Combine tables to deliver data → Users
Š Requirement → Two tables share a common data element
COMP 5131
13
Database Management Systems
¾Operations of a Relational DBMS
„
Select
Š Creates a subset of all records meeting stated criteria
„
Join
Š Combines relational tables to present the ser with more
information than is available from individual tables
„
Project
Š Creates a subset consisting of columns in a table
Š Permits user to create new tables containing only desired
information
COMP 5131
14
Database Management Systems
¾ Three Basic Operations of a Relational DBMS
The select, project, and join operations enable data from two different tables to be combined and only selected attributes to be displayed.
COMP 5131
15
Database Management Systems
¾Capabilities of DBMS
„
Data Definition Capabilities
Š Specify Structure of Contents of Database
„
Data Directory
Š Automated or manual file storing definitions of data elements
and their characteristics
„
Query and Data Reporting
Š Data manipulation language
„
„
„
Structured query language (SQL)
Microsoft Access query-building tools
Report generation, example → Crystal Reports
COMP 5131
16
Database Management Systems
¾ Access Data Directory Features
COMP 5131
17
Database Management Systems
¾ Example of SQL Query
COMP 5131
18
Database Management Systems
¾ An Access Query
COMP 5131
19
Database Management Systems
¾ An Access Query
COMP 5131
20
Database Management Systems
¾ Object-Oriented Database
„
DBMS designed for structured data rows/columns
Š Not suitable for graphics-based or multimedia applications
Š Object-oriented Database
„
„
„
„
OODBMS →Stores data and procedures that act on those data as
objects to be retrieved and shared
Usage → Manage multimedia components, Java applets for Web
Relatively slow compared to relational DBMS
Hybrid Object-relational DBMS → Provide capabilities of both types
¾ Databases
„
„
Improves Performance, Better Decisions
Tools
Š
Š
Š
Š
COMP 5131
Data warehousing
Multidimensional data analysis
Data mining
Utilizing Web interfaces to databases
21
Using Database to Improve Performance
¾Data Warehouse
„
„
„
Database that stores current and historical data that
may be of interest to decision makers
Consolidates and standardizes data from many
systems, operational and transactional databases
Data can be accessed but not altered
¾Data Marts
„
„
„
Subset of data warehouses that is highly focused and
isolated for a specific population of users
Can be constructed more quickly at lower cost
Example – Company might develop Marketing and
Sales Data Mart to deal with customer information
COMP 5131
22
Using Database to Improve Performance
¾ Components of Data Warehouse
The data are combined with data from external sources and reorganized into a central database designed for management
reporting and analysis. The information directory provides users with information about the data available in the warehouse.
COMP 5131
23
Using Database to Improve Performance
¾ Business Intelligence
„
Tools for consolidating, analyzing, and providing access to large
amounts of data to improve decision making
Š Software for database reporting and querying
Š Tools for multidimensional data analysis (online analytical processing)
Š Data Mining
COMP 5131
24
Using Database to Improve Performance
¾Data Mining
„
Finds hidden patterns, relationships in large databases
and infers rules from them to predict future behavior
„
Types of Information
Š Associations → Occurrences linked to single event
„
Example → Chips with Coke for 65% but 85% when promotion for Coke
Š Sequences → Events linked over time
„
Example → House purchasing followed by new refrigerator 65% within 2
weeks, oven 45% within one month
Š Classifications → Patterns describing a group an item belongs to
„
Example → Characteristics of customers who are likely to leave, campaign
Š Clusters → Discovering as yet unclassified groupings
Š Forecasting → Uses series of values to forecast future values
COMP 5131
25
Using Database to Improve Performance
¾Data Mining
„
Applications for all functional areas of business
Š Government, Scientific Applications
„
Usage
Š Patterns in Customer Data → Identifying profitable customers
or for one-to-one marketing campaigns
Š Predictive Analysis → Using data mining techniques, historical
data, and assumptions about future conditions to predict
outcomes of events, such as the probability a customer will
respond to an offer or purchase a specific product
¾Privacy Concerns
„
„
Usage
Create detailed data image about each individual
COMP 5131
26
Case Study – DNA Databases
¾ Crime Fighting Weapon or Threat to Privacy?
¾ Questions
„
What are the benefits of DNA databases?
„
What problems do DNA databases pose?
„
Who should be included in a national DNA database?
Should it be limited to convicted felons? Explain your
answer.
„
Who should be able to use DNA databases?
COMP 5131
27
Using Database to Improve Performance
¾ Databases and the web
„
Information from Internal Databases → Customers
Š View Product Catalog, Place Order
„
„
Request from HTML Commands → SQL for DBMS Processing (database
server)
Software make this possible
Š Web server
Š Application servers or CGI
Š Database server
„
Advantage of using web to access internal databases
Š Much less training to employees
Š Few or no changes in internal databases
Š Savings over redesigning and rebuilding legacy systems
COMP 5131
28
Managing Data Resources
¾ Policies and Procedures for Data Management
„
Information Policy
Š Organization’s rules → Sharing, Disseminating, Acquiring, Classifying,
Inventorying information
Š Example → Right to change/view sensitive employee data
„
Data Administration
Š Database design and management group responsible for defining and
organizing the structure and content of the database, and maintaining
the database
Š Specific policies and procedures for data management
Š Responsibilities → Developing information policy, defining and
organizing structure and content of database, planning for data, data
directory development, Overseeing logical database design
COMP 5131
29
Managing Data Resources
¾ Ensuring Data Quality
„
Poor Data Quality
Š Major problem for successful customer management relationship
Š About 20% of US mail and packages are returned because of incorrect
names or addresses
„
Why Data Quality Problems?
Š Redundant and inconsistent data produced by multiple systems
Š Data input errors → Major data quality problems
„
Data Quality Audit
Š Structured survey of the accuracy and completeness of data
„
Data Cleansing
Š Detects and corrects incorrect, incomplete, improperly formatted, and
redundant data
Š Specialized data cleansing software → Automatically survey data files,
correct errors in the data, integrate data into company wide format
COMP 5131
30