Document 6520587

Transcription

Document 6520587
Why use database management systems (DMSs)
•  To provide a uniform, logical model for representing data
(relational data model)
•  To provide a powerful, uniform language for querying and
updating data (SQL)
•  To allow powerful optimisations for efficient query evaluation
(indexing, query transformation)
•  To ensure data integrity within single applications
(constraint checking, recovery)
•  To ensure data integrity across multiple concurrent
applications (concurrency control) 2503ICT WP
1
Examples of (relational) DMSs (RDMSs)
Open source or free systems
•  SQLite (www.sqlite.org)
•  MySQL (www.mysql.com)
•  PostgreSQL (www.postgres.org)
Commercial systems
•  JFile (Palm OS)
•  Microsoft Access (Windows)
•  FileMaker Pro (Windows, OS X)
•  Oracle
•  IBM DB2
•  Microsoft SQL Server
2503ICT WP
2
Relational database concepts
•  Relations defining entities and relationships
–  A relation or table is a set of rows or tuples
–  An entity is an object
–  A relationship is a relationship between entities (sic)
•  Attributes
–  An attribute is an atomic property of an entity or relationship
•  Domains
–  A domain is the set of possible values of an attribute
–  An attribute value may be NULL (unkown, inapplicable)
•  Keys
– 
– 
– 
– 
A key is an attribute or set of attributes that uniquely define a tuple.
No two tuples in a relation can have the same value for their key.
A table may have more than one key.
One key may be designated the primary key.
2503ICT WP
3
Relational database design
Suppose we wish to store information about student enrolments
in courses as follows:
Enrolments (CCode, CName, SNum, Sname, SAddr, Year)
Here, we store all information about courses and students and
enrolments in a single table. For each year (the course is
offered) there is one tuple for each course-student pair. This
design has the following problems...
2503ICT WP
4
Relational database design (cont.)
Enrolments (CCode, CName, SNum, Sname, SAddr, Year)
1.  We have to store the name and address of a student for
every course in which the student is enrolled, a waste of
space.
2. If we update the name or address of a student, we have to
update the information for every course in which the student
is enrolled, a waste of time.
3. If a student temporarily withdraws from all courses, the
system no longer knows the students name and address, a
loss of information.
There must be a better way...
2503ICT WP
5
Relational database design (cont.)
Integrity constraints
•  Domain constaints
–  Each attribute can only take values from its specified domain.
•  Entity integrity constraints
–  No attribute in a key of a relation can be null.
•  Key constraints
–  No two tuples in a relation can have the same value for their keys.
•  Referential integrity constraints
–  If attribute A in relation R has the same domain as attribute B in
relation S, and B is a key (or component of a key) for S, then
whenever t1 is a tuple in R with t[A] = x, then there must exist a tuple
t2 in S with t2[B] = x.
–  I.e., you can't refer to something that doesn't exist.
–  The attribute R[A] that refers to the key S[B] is called a foreign key.
2503ICT WP
6
Relational database design (cont.)
•  Each relation should represent either
–  a single entity,
–  or a single relationship between two or more entities.
•  (Functional) dependencies should be identified
–  We say {A1,...,An} → B in R if, for all tuples t1 and t2 in R, if
t1[A1,...,An] = t2[A1,...,An], then t1[B] = t2[B].
•  Each relation should have a specified key.
–  If K = {A1,...,An} is a key for R, then K → B for all attributes B in R.
•  If {A1,...,An} → {B1,...,Bm} in a dependency in R, then either: –  {B1,...,Bm} ⊆ {A1,...,An}, or
–  {A1,...,An} is a superkey for R, i.e., some subset of {A1,...,An} is a key
for R.
•  A relation that satisfies this condition is said to be in Boyce
-Codd Normal Form (BCNF). (Many other normal forms are
possible.)
2503ICT WP
7
Example
Entities
•  Stock (ItemId, ItemName, Quantity, Price, Description)
–  ItemId (key) is int, ItemName is string, Quantity is int, Price is float,
Description is string.
•  Customer(CustId, CustName, Address, Email)
–  CustId (key) is int, CustName is string, Address is string
–  CustName should not be a key, as different customers could have the
same name.
Relationships
•  Order(OrderNo, ItemId, CustId, Date, Quantity)
–  OrderNo (key) is int, Date is date, Quantity is int
–  {ItemId,CustId} is not a key, as there could be two different orders for
the same item by the same customer on different dates.
–  ItemID and CustId are foreign keys that reference Stock[ItemID] and
2503ICT WP
8
Customer[CustId].
Example (cont.)
Note that this design does not have the redundancies and
update anomalies of the course-student database above.
Exercise Redesign the course-student database in a similar
way to avoid the redundancies and update anomalies identified
previously.
2503ICT WP
9
SQL
History of SQL
•  Relational model first clearly proposed by Ted Codd in 1970.
•  First SQL-like languages designed and implemented at IBM
in mid-1970s.
•  First SQL standard emerged in mid-1980s.
•  Most recent SQL standard, SQL3, was defined in 1999.
•  This latest standard has many independent modules, which
are being gradually defined. It includes many object
-oriented features!
•  It is still not really a standard!
•  Most implementations provide a command-line interface.
•  In Web applications, an application programming interface
(native, PDO, ODBC, JDBC) is used.
2503ICT WP
10
Common SQLite types
• 
• 
• 
• 
• 
• 
null, unknown or unrequired value
integer, an integer
real, a floating point number
double, a double precision floating point number
text, a string of at most 65,535 characters
blob, a large binary value (e.g., an image)
Other types may be supported in particular implementations.
Integer, non-null, primary key fields are auto-incremented.
2503ICT WP
11
Common MySQL types
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
int, an integer
int(4), a 4-digit integer
float, a floating point number
double, a double precision floating point number
decimal(8,2), an 8-digit, 2 decimal places, fixed precision,
floating point number (represented as a string!)
char, a character
char(12), a 12-character string
varchar(12), a string of at most 12 characters
text, a string of at most 65,535 characters
date, datetime, time, timestamp
blob, a large binary value (e.g., an image)
2503ICT WP
12
Defining attributes in MySQL (and SQLite)
• 
• 
• 
• 
• 
• 
Every attribute must be given a type.
(Integer) attribute values may be generated automatically.
Attributes may be declared non-null.
Attributes may have a default value.
One or more attributes may be defined to form a key.
One or more attributes may be declared a foreign key that
refers to a key in another (or the same) relation.
•  Other integrity constraints may also be defined (and perhaps
ignored).
•  Foreign key constraints are ignored in MySQL up to version 4
but are supported in version 5.
2503ICT WP
13
Defining tables in MySQL
Stock
CREATE TABLE Stock (
Id INT(6) NOT NULL auto_increment,
Name VARCHAR(20) DEFAULT '' NOT NULL,
Quantity INT(4) DEFAULT '0' NOT NULL,
Price DECIMAL(8,2) NOT NULL,
Description TEXT,
PRIMARY KEY (Id),
KEY (Name));
2503ICT WP
14
Defining tables in MySQL (cont.)
Customers
CREATE TABLE Customers (
Id INT(8) NOT NULL AUTO_INCREMENT,
Name VARCHAR(20) DEFAULT '' NOT NULL,
Address VARCHAR(80),
Email VARCHAR(30),
PRIMARY KEY (Id),
INDEX (Name));
2503ICT WP
15
Defining tables in MySQL (cont.)
Orders
CREATE TABLE Orders (
Id INT(8) NOT NULL AUTO_INCREMENT,
ItemId INT(6) NOT NULL,
CustId INT(8) NOT NULL,
OrderDate DATE,
Quantity INT(6) DEFAULT '0',
PRIMARY KEY (Id),
FOREIGN KEY (ItemId)
REFERENCES Stock(Id),
FOREIGN KEY (CustId)
REFERENCES Customer(Id));
2503ICT WP
16
Displaying database structure (in SQLite)
• 
• 
• 
• 
• 
.databases
.tables
.schema tablename
.indices tablename
.show
2503ICT WP
17
Displaying database structure (in MySQL)
• 
• 
• 
• 
• 
• 
SHOW DATABASES
SHOW TABLES (in a selected database)
DESCRIBE tablename
SHOW COLUMNS FOR tablename
SHOW INDEX FROM tablename
SHOW STATUS
2503ICT WP
18
Inserting data in SQL (cont.)
Single-tuple insertion:
INSERT INTO Stock(Id, Name, Quantity, Price,
Description)
VALUES (NULL, "Marcel's Morsels", 1500, 1.25,
"Delectable, delicious delicacies");
•  Attribute Id is given the next available integer value.
•  Hence all rows have different values for the key Id.
•  The system checks that no row with the same value for the
other key, Name, already exists.
2503ICT WP
19
Inserting data in SQL
Multiple-tuple insertion:
INSERT INTO Stock(Id, Name, Quantity, Price,
Description)
VALUES
(NULL, "Marcel's Morsels", 1500, 1.25,
"Delectable delicious delicacies"),
(NULL, "Fred’s Fries", 1000, 0.75,
"Fred’s Fabulous French Fries");
•  Attribute names can be provided for greater accuracy.
•  Multiple rows can be inserted in the same statement.
2503ICT WP
20
Inserting data in SQL (cont.)
General tuple insertion:
INSERT INTO Stock(Id, Name, Quantity, Price,
Description)
SELECT *
FROM OldStock
WHERE Price < 100.0;
•  The result of any SQL query returning an answer (a table)
with the same attributes over the same domains as Stock
may be inserted into the table. Missing attributes have NULL
or default values. See below for descriptions of SQL
queries.
•  Bulk loading from TSV text data is also possible. 2503ICT WP
21
Deleting and updating data in SQL
We need to know how to query tables first...
(Also for inserting, as on the previous slide.)
2503ICT WP
22
Querying data in SQL
All queries are extensions of the following basic form:
SELECT values
FROM tables
WHERE condition
For example,
SELECT Name, Email
FROM Customers
WHERE Address LIKE "%Logan%";
returns the name and email of every customer whose address
contains the string "Logan". (The more restrictive condition
Address = "Logan" could also be used.)
2503ICT WP
23
Example queries in SQL
1. Find all attributes of all customers whose address is in
Nathan.
SELECT * FROM Customers
WHERE Address LIKE "%Nathan%";
2. Find item id and customer id and quantity of all orders for
items whose price is less than $10.
SELECT ord.ItemId, ord.CustId, ord.Quantity
FROM Orders ord, Stock item
WHERE ord.ItemId = item.Id
AND item.Price BETWEEN 10.0 AND 20.0;
2503ICT WP
24
Example queries in SQL (cont.)
3. Find names and addresses of customers who have ordered
items whose price is less than $10, ordered by customer name.
SELECT cust.Name, cust.Address
FROM Customers cust, Orders ord, Stock item
WHERE cust.Id = ord.CustId
AND ord.ItemId = item.Id
AND item.Price < 10.0
ORDER BY cust.Name;
We can simply write the attribute name (e.g., Address,
Price) when there is no ambiguity.
2503ICT WP
25
Example queries in SQL (cont.)
4. Find names of customers and total quantity of items ordered
by that customer, for customers who have ordered at least one
item, ordered by customer name.
SELECT cust.Name, SUM(ord.Quantity)
FROM Customers cust, Orders ord
WHERE cust.Id = ord.CustId
GROUP BY cust.Id;
Exercise Modify this query to return the customer names and
total value of items ordered, ordered by customer name again.
The main aggregate functions for use with GROUP BY are
COUNT, SUM, AVG, MIN and MAX. 2503ICT WP
26
Example queries in SQL (cont.)
5. Find names of customers and total quantity of items ordered
by that customer, ordered by customer name, provided the total
quantity is at least 10.
SELECT cust.Name, SUM(ord.Quantity)
FROM Customers cust, Orders ord
WHERE cust.Id = ord.CustId
GROUP BY cust.Id
ORDER BY cust.Name;
HAVING SUM(ord.Quantity) >= 10
2503ICT WP
27
Example queries in SQL (cont.)
7. Find all distinct item ids of stock items ordered by customers
on the Gold Coast.
SELECT DISTINCT item.Id
FROM Customers cust, Orders ord
WHERE cust.Id = ord.CustId
AND cust.Address LIKE "%Southport%";
Without DISTINCT, we would have one row for each order of
the same item by (different) Southport customers, i.e., multiple
rows with the same item id.
2503ICT WP
28
Example queries in SQL (cont.)
6. Find names of the 10 customers who have ordered the
greatest total quantity of items ordered by quantity.
SELECT cust.Name, SUM(ord.Quantity)
FROM Customers cust, Orders ord
WHERE cust.Id = ord.CustId
GROUP BY cust.Id
ORDER BY SUM(ord.Quantity) DESC
LIMIT 0, 10;
The LIMIT clause has parameters offset (from start of result)
and number (of rows to be included in result).
2503ICT WP
29
Querying data in SQL (cont.)
•  These queries just scratch the surface of what is possible.
•  Fortunately, only simple queries are used in most cases.
•  Hence, cheaper DMSs (SQLite, Access) have only
implemented a small part of the query language, sufficient to
answer the most commonly asked queries.
•  When testing your database and queries, a command-line
interface is useful.
•  In production, queries are sent to the database from the
server-side (PHP) program using the DMS's API.
2503ICT WP
30
Deleting and updating data in SQL
1. Delete the Orders table from the database.
DROP TABLE Orders;
2. Delete all tuples from the Orders table (and retain the now
empty table).
DELETE FROM Orders;
3. Delete all customers whose name is "John Smith".
DELETE FROM Customers
WHERE Name = "John Smith";
2503ICT WP
31
Deleting and updating data in SQL (cont.)
4. Double the quantity of all orders for customers whose id is
greater than 100.
UPDATE Orders
SET Quantity = 2*Quantity
WHERE CustId > 100;
Exercise Double the quantity of all orders for customers whose
address is in Mt Gravatt.
5. Change the name and address of the customer with id 15.
UPDATE Customers
SET Name = “John”, Address = “Logan”
WHERE Id = 15;
2503ICT WP
32
Tuning SQL performance
This is a very complex issue. Some possible steps are:
•  Keep attributes small, to minimise space.
•  Used fixed-length attribute types where possible, not
variable-length types such as VARCHAR(12), TEXT and
BLOB. •  Define indexes on attributes frequently used in WHERE
conditions.
•  Store the results of previous queries in materialised views to
avoid subsequent recomputation. (Views are not supported
in MySQL, apparently.)
•  Reconsider the database design according to the frequently
asked queries. Perhaps combine two original tables into a
single larger table to avoid expensive WHERE conditions.
2503ICT WP
33