Course Introduction Introduction to Databases

Transcription

Course Introduction Introduction to Databases
Course Introduction
Introduction to Databases
Instructor: Joe Bockhorst
University of Wisconsin - Milwaukee
First Reading Assignment
• Chapters 1 and 2 (today and part of Thursday)
Chapter 13 and handout
“There's a prayer each night that I always pray:
Let the data guide me through every day”
Warren Zevon
Data is Ubiquitous
•
•
Three classes of technological advances are
changing our relationship with data:
More storage space
– allows us to keep more data
•
Faster processor (and memory) speeds
– allows us to access and process more data
•
Different “sensors”
– allows us to access new kinds of data
http://en.wikipedia.org/wiki/Hard_disk
Microarrays – An Example of a
New Sensing Technology
The color of each
spot represents the
activity level of a
gene under some
experimental
condition
10 000s of spots
on a single chip
A microarray
Other Data Examples
•
•
•
•
•
•
•
•
•
•
Airline flight management system
Financial data
Commercial store (eg, WalMart) data
Department of Motor Vehicles
Surveillance video
University student records
Baseball results
Web sites
Medical records
...
Effectively Data Management is Essential
• Organizations need their data to be an asset
• Given:
the amount of data available to store
&
costs to manage data (hardware, software, labor)
• Ineffective policies can make an organization’s
data a liability
Database Management System (DBMS)
• DBMS is:
– A collection of software programs
– General purpose
• DBMS enables users to:
–
–
–
–
–
Define DB
Construct DB
Change (or update) DB
Ask questions about the data in DB
Share DB
• DBMS maintains the integrity of DB
Some RDBM Systems
Commercial Systems
Oracle ($$$$)
DB2 (IBM) ($$$)
SQL Server (Microsoft) ($$)
Open Source Systems
PostgreSQL
MySQL
Source: International Data Corporation
Main Goals of this Course
• To understand how to use a DBMS
– How to create DB, data models, SQL,...
• To understand how a DBMS works
– Physical properties of disks and files, software to
manage reading and writing to disk, implementation
of algorithms to answer user queries,...
catalog
Databases are self-describing: catalog describes
the structure of the data stored in the DB
Example: Internet Movie Database (IMDB)
Building a DB:
construct a conceptual model
• A conceptual model identifies entities and relationships
role type
role
title
release date
name
acts in
N
M
movie
person
1
N
director of
entity
attribute
relationship
birthdate
Building a DB:
Define DB Schema
• A schema describes DB using data model
supported by DMBS (eg, relational model)
• RDBMS – DBMS that supports relational model
MOVIE
MID
Title
PERSON
PID
ACTS_IN
MID
PID
Rating
Name
Director
Bday
Role
Rtype
A Schema Diagram for “University” DB
(from the textbook)
tables
columns
Building a DB:
Describe Physical Data Model
• PDM indicates how data is organized on disk
• Includes description of access paths or indexes
– Example: store “Movie” table with records ordered by MID and
construct an index on the “Title” attribute
1
The Big Lebowski
2
Star Wars
270
The Big Chill
R
99
PG 16
The Big Chill
The Big Lebowski
Index on Title column
PG
3
File of records of the MOVIE table
Building a DB:
Populate DB
MOVIE
MID
Title
Rating
Director
1
The Big Lebowski
R
72
2
Star Wars
PG
29
...
ACTS_IN
MID
PID
PERSON
Role
Rtype
PID
Name
Bday
1
1
The Dude
STAR
1
Jeff Daniels
12/4/49
2
2
Han Solo
CO_STAR
2
Harrison Ford
7/13/42
...
...
Set initial records of the DB
Querying The Database
• Most RDBMS allow users to query the database
using SQL (structured query language)
• Example: get cast of “The Big Lebowski”
SELECT Name, Role, Rtype
FROM PERSON, ACTS_IN
WHERE MID = ‘1’ AND PERSON.PID == ACTS_IN.PID
Building the Application Program
Implementing Queries
• “Relational Algebra” is a mathematical way to
describe operations on relational data
• SQL queries correspond to sequence of
relational algebra operations
– The previous query requires a join operation
between person and acts_in
• Query Optimization involves finding a good
order to carry out operations
• Operator implementation
Managing Physical Data Storage
• RDBMS maintains database (and meta-data) on
non-volatile storage (hard disks)
• Physical design impacts RDBMS performance
• Example: The time to answer a query such as
What is the MID of “The Big Lebowski” can be
greatly reduced if an index of Title column is
maintained for the Movie table.
Maintaining Integrity of the Database
• Concurrent users
– Multiple users may attempt to update simultaneously
• Security
– Preventing unauthorized access
• System failures
– If lightening strikes during an update the DB must
able to be recovered
Summary of Topics
•
•
•
•
•
•
•
Conceptual modeling
Logical Modeling
Querying the DB
Building applications
Implementing Queries
Managing hardware
Maintaining Integrity
how to use DBMS
how a DBMS works
Control Abstraction
User
Application Program
Query Optimization
Relational Operators
DBMS
Files and Access Methods
Buffer Management
Disk Space Management
DB
Each layer
need not know
(or care) how
other layers are
implemented
Data Abstraction
Each layer need not know how other layers organize data
Why Use DBMS?
•
•
•
•
•
Program Data Independence
Controlling redundancy
Providing backup and recovery
Efficient query processing
Others: see Section 1.6
Why not to use a DBMS?
• Consider custom software if DBMS overhead
(cost, complexity, performance) is unnecessary
– Example: single user of fixed dataset
Schemas and Instances
• A schema describes a database
– RDBMS typically store schemas in the catalog
• The actual data in the DB at a particular time is
the database state
– The current set of all instances in the DB
People who work with DBMSs
• Database Administrator DBA
– Maintains databases, DBMS and related software
– [avg salary* $76k]
• Application Programmers
– Software engineers (developers) that build software
solutions for end users that access DBMS
• End Users
– Example: bank teller uses “canned transactions”
• DBMS designers and implementers
– Example: Oracle developers
*source: payscale.com, 2007