Document 6520587
Transcription
Document 6520587
Why use database management systems (DMSs) • To provide a uniform, logical model for representing data (relational data model) • To provide a powerful, uniform language for querying and updating data (SQL) • To allow powerful optimisations for efficient query evaluation (indexing, query transformation) • To ensure data integrity within single applications (constraint checking, recovery) • To ensure data integrity across multiple concurrent applications (concurrency control) 2503ICT WP 1 Examples of (relational) DMSs (RDMSs) Open source or free systems • SQLite (www.sqlite.org) • MySQL (www.mysql.com) • PostgreSQL (www.postgres.org) Commercial systems • JFile (Palm OS) • Microsoft Access (Windows) • FileMaker Pro (Windows, OS X) • Oracle • IBM DB2 • Microsoft SQL Server 2503ICT WP 2 Relational database concepts • Relations defining entities and relationships – A relation or table is a set of rows or tuples – An entity is an object – A relationship is a relationship between entities (sic) • Attributes – An attribute is an atomic property of an entity or relationship • Domains – A domain is the set of possible values of an attribute – An attribute value may be NULL (unkown, inapplicable) • Keys – – – – A key is an attribute or set of attributes that uniquely define a tuple. No two tuples in a relation can have the same value for their key. A table may have more than one key. One key may be designated the primary key. 2503ICT WP 3 Relational database design Suppose we wish to store information about student enrolments in courses as follows: Enrolments (CCode, CName, SNum, Sname, SAddr, Year) Here, we store all information about courses and students and enrolments in a single table. For each year (the course is offered) there is one tuple for each course-student pair. This design has the following problems... 2503ICT WP 4 Relational database design (cont.) Enrolments (CCode, CName, SNum, Sname, SAddr, Year) 1. We have to store the name and address of a student for every course in which the student is enrolled, a waste of space. 2. If we update the name or address of a student, we have to update the information for every course in which the student is enrolled, a waste of time. 3. If a student temporarily withdraws from all courses, the system no longer knows the students name and address, a loss of information. There must be a better way... 2503ICT WP 5 Relational database design (cont.) Integrity constraints • Domain constaints – Each attribute can only take values from its specified domain. • Entity integrity constraints – No attribute in a key of a relation can be null. • Key constraints – No two tuples in a relation can have the same value for their keys. • Referential integrity constraints – If attribute A in relation R has the same domain as attribute B in relation S, and B is a key (or component of a key) for S, then whenever t1 is a tuple in R with t[A] = x, then there must exist a tuple t2 in S with t2[B] = x. – I.e., you can't refer to something that doesn't exist. – The attribute R[A] that refers to the key S[B] is called a foreign key. 2503ICT WP 6 Relational database design (cont.) • Each relation should represent either – a single entity, – or a single relationship between two or more entities. • (Functional) dependencies should be identified – We say {A1,...,An} → B in R if, for all tuples t1 and t2 in R, if t1[A1,...,An] = t2[A1,...,An], then t1[B] = t2[B]. • Each relation should have a specified key. – If K = {A1,...,An} is a key for R, then K → B for all attributes B in R. • If {A1,...,An} → {B1,...,Bm} in a dependency in R, then either: – {B1,...,Bm} ⊆ {A1,...,An}, or – {A1,...,An} is a superkey for R, i.e., some subset of {A1,...,An} is a key for R. • A relation that satisfies this condition is said to be in Boyce -Codd Normal Form (BCNF). (Many other normal forms are possible.) 2503ICT WP 7 Example Entities • Stock (ItemId, ItemName, Quantity, Price, Description) – ItemId (key) is int, ItemName is string, Quantity is int, Price is float, Description is string. • Customer(CustId, CustName, Address, Email) – CustId (key) is int, CustName is string, Address is string – CustName should not be a key, as different customers could have the same name. Relationships • Order(OrderNo, ItemId, CustId, Date, Quantity) – OrderNo (key) is int, Date is date, Quantity is int – {ItemId,CustId} is not a key, as there could be two different orders for the same item by the same customer on different dates. – ItemID and CustId are foreign keys that reference Stock[ItemID] and 2503ICT WP 8 Customer[CustId]. Example (cont.) Note that this design does not have the redundancies and update anomalies of the course-student database above. Exercise Redesign the course-student database in a similar way to avoid the redundancies and update anomalies identified previously. 2503ICT WP 9 SQL History of SQL • Relational model first clearly proposed by Ted Codd in 1970. • First SQL-like languages designed and implemented at IBM in mid-1970s. • First SQL standard emerged in mid-1980s. • Most recent SQL standard, SQL3, was defined in 1999. • This latest standard has many independent modules, which are being gradually defined. It includes many object -oriented features! • It is still not really a standard! • Most implementations provide a command-line interface. • In Web applications, an application programming interface (native, PDO, ODBC, JDBC) is used. 2503ICT WP 10 Common SQLite types • • • • • • null, unknown or unrequired value integer, an integer real, a floating point number double, a double precision floating point number text, a string of at most 65,535 characters blob, a large binary value (e.g., an image) Other types may be supported in particular implementations. Integer, non-null, primary key fields are auto-incremented. 2503ICT WP 11 Common MySQL types • • • • • • • • • • • int, an integer int(4), a 4-digit integer float, a floating point number double, a double precision floating point number decimal(8,2), an 8-digit, 2 decimal places, fixed precision, floating point number (represented as a string!) char, a character char(12), a 12-character string varchar(12), a string of at most 12 characters text, a string of at most 65,535 characters date, datetime, time, timestamp blob, a large binary value (e.g., an image) 2503ICT WP 12 Defining attributes in MySQL (and SQLite) • • • • • • Every attribute must be given a type. (Integer) attribute values may be generated automatically. Attributes may be declared non-null. Attributes may have a default value. One or more attributes may be defined to form a key. One or more attributes may be declared a foreign key that refers to a key in another (or the same) relation. • Other integrity constraints may also be defined (and perhaps ignored). • Foreign key constraints are ignored in MySQL up to version 4 but are supported in version 5. 2503ICT WP 13 Defining tables in MySQL Stock CREATE TABLE Stock ( Id INT(6) NOT NULL auto_increment, Name VARCHAR(20) DEFAULT '' NOT NULL, Quantity INT(4) DEFAULT '0' NOT NULL, Price DECIMAL(8,2) NOT NULL, Description TEXT, PRIMARY KEY (Id), KEY (Name)); 2503ICT WP 14 Defining tables in MySQL (cont.) Customers CREATE TABLE Customers ( Id INT(8) NOT NULL AUTO_INCREMENT, Name VARCHAR(20) DEFAULT '' NOT NULL, Address VARCHAR(80), Email VARCHAR(30), PRIMARY KEY (Id), INDEX (Name)); 2503ICT WP 15 Defining tables in MySQL (cont.) Orders CREATE TABLE Orders ( Id INT(8) NOT NULL AUTO_INCREMENT, ItemId INT(6) NOT NULL, CustId INT(8) NOT NULL, OrderDate DATE, Quantity INT(6) DEFAULT '0', PRIMARY KEY (Id), FOREIGN KEY (ItemId) REFERENCES Stock(Id), FOREIGN KEY (CustId) REFERENCES Customer(Id)); 2503ICT WP 16 Displaying database structure (in SQLite) • • • • • .databases .tables .schema tablename .indices tablename .show 2503ICT WP 17 Displaying database structure (in MySQL) • • • • • • SHOW DATABASES SHOW TABLES (in a selected database) DESCRIBE tablename SHOW COLUMNS FOR tablename SHOW INDEX FROM tablename SHOW STATUS 2503ICT WP 18 Inserting data in SQL (cont.) Single-tuple insertion: INSERT INTO Stock(Id, Name, Quantity, Price, Description) VALUES (NULL, "Marcel's Morsels", 1500, 1.25, "Delectable, delicious delicacies"); • Attribute Id is given the next available integer value. • Hence all rows have different values for the key Id. • The system checks that no row with the same value for the other key, Name, already exists. 2503ICT WP 19 Inserting data in SQL Multiple-tuple insertion: INSERT INTO Stock(Id, Name, Quantity, Price, Description) VALUES (NULL, "Marcel's Morsels", 1500, 1.25, "Delectable delicious delicacies"), (NULL, "Fred’s Fries", 1000, 0.75, "Fred’s Fabulous French Fries"); • Attribute names can be provided for greater accuracy. • Multiple rows can be inserted in the same statement. 2503ICT WP 20 Inserting data in SQL (cont.) General tuple insertion: INSERT INTO Stock(Id, Name, Quantity, Price, Description) SELECT * FROM OldStock WHERE Price < 100.0; • The result of any SQL query returning an answer (a table) with the same attributes over the same domains as Stock may be inserted into the table. Missing attributes have NULL or default values. See below for descriptions of SQL queries. • Bulk loading from TSV text data is also possible. 2503ICT WP 21 Deleting and updating data in SQL We need to know how to query tables first... (Also for inserting, as on the previous slide.) 2503ICT WP 22 Querying data in SQL All queries are extensions of the following basic form: SELECT values FROM tables WHERE condition For example, SELECT Name, Email FROM Customers WHERE Address LIKE "%Logan%"; returns the name and email of every customer whose address contains the string "Logan". (The more restrictive condition Address = "Logan" could also be used.) 2503ICT WP 23 Example queries in SQL 1. Find all attributes of all customers whose address is in Nathan. SELECT * FROM Customers WHERE Address LIKE "%Nathan%"; 2. Find item id and customer id and quantity of all orders for items whose price is less than $10. SELECT ord.ItemId, ord.CustId, ord.Quantity FROM Orders ord, Stock item WHERE ord.ItemId = item.Id AND item.Price BETWEEN 10.0 AND 20.0; 2503ICT WP 24 Example queries in SQL (cont.) 3. Find names and addresses of customers who have ordered items whose price is less than $10, ordered by customer name. SELECT cust.Name, cust.Address FROM Customers cust, Orders ord, Stock item WHERE cust.Id = ord.CustId AND ord.ItemId = item.Id AND item.Price < 10.0 ORDER BY cust.Name; We can simply write the attribute name (e.g., Address, Price) when there is no ambiguity. 2503ICT WP 25 Example queries in SQL (cont.) 4. Find names of customers and total quantity of items ordered by that customer, for customers who have ordered at least one item, ordered by customer name. SELECT cust.Name, SUM(ord.Quantity) FROM Customers cust, Orders ord WHERE cust.Id = ord.CustId GROUP BY cust.Id; Exercise Modify this query to return the customer names and total value of items ordered, ordered by customer name again. The main aggregate functions for use with GROUP BY are COUNT, SUM, AVG, MIN and MAX. 2503ICT WP 26 Example queries in SQL (cont.) 5. Find names of customers and total quantity of items ordered by that customer, ordered by customer name, provided the total quantity is at least 10. SELECT cust.Name, SUM(ord.Quantity) FROM Customers cust, Orders ord WHERE cust.Id = ord.CustId GROUP BY cust.Id ORDER BY cust.Name; HAVING SUM(ord.Quantity) >= 10 2503ICT WP 27 Example queries in SQL (cont.) 7. Find all distinct item ids of stock items ordered by customers on the Gold Coast. SELECT DISTINCT item.Id FROM Customers cust, Orders ord WHERE cust.Id = ord.CustId AND cust.Address LIKE "%Southport%"; Without DISTINCT, we would have one row for each order of the same item by (different) Southport customers, i.e., multiple rows with the same item id. 2503ICT WP 28 Example queries in SQL (cont.) 6. Find names of the 10 customers who have ordered the greatest total quantity of items ordered by quantity. SELECT cust.Name, SUM(ord.Quantity) FROM Customers cust, Orders ord WHERE cust.Id = ord.CustId GROUP BY cust.Id ORDER BY SUM(ord.Quantity) DESC LIMIT 0, 10; The LIMIT clause has parameters offset (from start of result) and number (of rows to be included in result). 2503ICT WP 29 Querying data in SQL (cont.) • These queries just scratch the surface of what is possible. • Fortunately, only simple queries are used in most cases. • Hence, cheaper DMSs (SQLite, Access) have only implemented a small part of the query language, sufficient to answer the most commonly asked queries. • When testing your database and queries, a command-line interface is useful. • In production, queries are sent to the database from the server-side (PHP) program using the DMS's API. 2503ICT WP 30 Deleting and updating data in SQL 1. Delete the Orders table from the database. DROP TABLE Orders; 2. Delete all tuples from the Orders table (and retain the now empty table). DELETE FROM Orders; 3. Delete all customers whose name is "John Smith". DELETE FROM Customers WHERE Name = "John Smith"; 2503ICT WP 31 Deleting and updating data in SQL (cont.) 4. Double the quantity of all orders for customers whose id is greater than 100. UPDATE Orders SET Quantity = 2*Quantity WHERE CustId > 100; Exercise Double the quantity of all orders for customers whose address is in Mt Gravatt. 5. Change the name and address of the customer with id 15. UPDATE Customers SET Name = “John”, Address = “Logan” WHERE Id = 15; 2503ICT WP 32 Tuning SQL performance This is a very complex issue. Some possible steps are: • Keep attributes small, to minimise space. • Used fixed-length attribute types where possible, not variable-length types such as VARCHAR(12), TEXT and BLOB. • Define indexes on attributes frequently used in WHERE conditions. • Store the results of previous queries in materialised views to avoid subsequent recomputation. (Views are not supported in MySQL, apparently.) • Reconsider the database design according to the frequently asked queries. Perhaps combine two original tables into a single larger table to avoid expensive WHERE conditions. 2503ICT WP 33