Physical Design
Transcription
Physical Design
Physical Design - RDBMS LIS458 | Benoit Spring 2011 1 Monday, March 14, 2011 Today ✤ Main points about the transition from conceptual & logical to physical aspects of RDBMS ✤ Main points about what to look for ✤ Supplemental points about the mechanics of RDBMS systems ✤ DCL, DML ✤ Students’ demo of candidate queries on their tables ✤ The other readings, hands-on practice with phpMyAdmin and the terminal window and especially Forta’s text come into play today! 2 Monday, March 14, 2011 Example (Microsoft) believe it or not ✤ SQL Server 2000 ✤ The I/O subsystem (storage engine) is a key component of any relational database. A successful database implementation usually requires careful planning at the early stages of your project. The storage engine of a relational database requires much of this planning, which includes determining: ✤ What type of disk hardware to use, such as RAID (redundant array of independent disks) devices. ... ✤ How to place your data onto the disks ... ✤ Which index design to use to improve query performance in accessing data. ... ✤ How to set all configuration parameters appropriately for the database to perform well. … [http://msdn.microsoft.com/en-us/library/aa178575 (v=sql.80).aspx] Monday, March 14, 2011 3 Main points ✤ After needs assessment, ER design, schema refinement, and definition of views, we have addressed most of the conceptual and logical (or external) schema issues related to our data needs ✤ Now must consider ✤ what to index on ✤ how to cluster data ✤ what will impact optimizations for storage and retrieval ✤ when to break the rules of data decomposition ✤ what should we do to the conceptual schema if the queries we create are unwieldy or not efficient? Can we undo the data? ✤ Should we consider optimizations in the indices, such as Hash, B+ tree others ✤ How might we cluster and join our data? Monday, March 14, 2011 4 A model of the design process External Model Application 1 External Model Application 1 Conceptual Requirements Conceptual Model Application 2 External Model Logical Model Physical Model Application 2 Conceptual Requirements Internal Model 5 Monday, March 14, 2011 Review - Conceptual Modeling ✤ Usually the results of Systems Analysis ✤ Conceptual level: top-down (or bottom-up) that translates the “business information requirements” into an operational database ✤ Info requirements are tightly coupled with business function requirements ✤ Objective is to define and model the things of significance that the business needs to know and the relationships between them ✤ Ignores specifics of hardware and software ✤ Higher level look at the “database” 6 Monday, March 14, 2011 Review - Data modeling (logical) ✤ Objective is to map the information requirements reflected in the EntityRelationship Model (and its helper tools, relational schema or variants such as UML) into a Relational Database Design ✤ Necessarily software specific because data types vary by implementation ✤ Should be independent of the hardware ✤ ✤ Not always the case in commercial systems! So by now your understanding of the data needs and the components of RDBMS should be in place so we can ✤ transform entities into tables ✤ transform attributes into columns ✤ transform domains into data types and constraints. 7 Monday, March 14, 2011 Physical Modeling ✤ To create the physical relational database, tables, etc., to implement in machine form the database design ✤ Hardware and Software dependent ✤ Introduces file structure and memory requirements into the database designers’ world! [Design decisions affect the physical storage and retrieval of data - from the machine’s p.o.v. and from the user’s] ✤ International standard for communicating with the hardware - structured query language (SQL) ✤ Data Creation Language (DCL) [Data Control Language?] ✤ Data Definition Language (DDL) ✤ Data Manipulation Language (DML) 8 Monday, March 14, 2011 Physical design ✤ To be efficient - Optimize performance of databases ✤ Implements the requirements of users and data gathered during the design phase ✤ Especially in large systems, most DB designers determine the physical file storage requirements ✤ The normalized relations + size estimates for them ✤ Descriptions of the attributes (e.g., varchar(25)) ✤ When, how, and how often data are manipulated: entered, retrieved, deleted, updated ✤ Expectations of the data: speed of retrieval, security, backup, recovery, retention, integrity ✤ Descriptions of the technologies used to implement the DB 9 Monday, March 14, 2011 Physical design ✤ We are, then, affected by ✤ Systems performance - storage formats ✤ While storage is cheap, it isn’t free: the size of the fields add up, so don’t waste space ✤ Need to store strings correctly and be able to manipulate data appropriately (numbers and strings) ✤ Physical record composition ✤ Data arrangement ✤ Indices ✤ Query optimization and performance tuning 10 Monday, March 14, 2011 Remember... ✤ The power of RDBMS is in its making links to and among data ✤ Tables are related to each others through columns of data sharing identical data (the keys) ✤ Each table is based on set theory - ultimately each element in the set must be unique) ✤ Relational bases usually manipulate a set of data at a time rather than a single record of a time. ✤ Rows of data are called tuples - each are uniquely identified; a phenomenon of interest in the organization ✤ Rows consist of columns or attributes that describe the phenomenon 11 Monday, March 14, 2011 Terminology Table (Relation) Row (tuple) ID Name Phone clientID 201 Snowflake 555-1212 12 202 Crumpet 617-3038 14 203 Fishlips 555-9383 2 Column (attribute) 12 Monday, March 14, 2011 Terminology ✤ Primary key (PK) - a column or set of columns that identify uniquely each row in a table ✤ A PK of multiple columns is a Composite Primary Key ✤ No part of the PK can be null ✤ Auto-generated data are great for this. ✤ Foreign key (FK) - a column or combination of columns in one table that refer to a primary key in the same or another table ✤ A FK must match an existing primary key value ✤ If a FK is part of a primary key, the FK cannot be null 13 Monday, March 14, 2011 Terminology ✤ ERD and/or UML should be accurate representations of the info needs and organization’s activities ✤ Effective means for collecting and documenting an organization’s info needs ✤ To facilitate communicating ideas to the users ✤ Facilitate development of the physical design because relations are clarified ✤ ERDs are often part of other, larger projects ✤ Goal is to decompose by understanding the work flow processes into 1:M or 1:1 ✤ Goal is 3NF: all attributes of the entity depend on the primary key ✤ If you can get to the row via the key, you can get all the data 14 Monday, March 14, 2011 Data types (e.g., MS Access) ✤ Numeric (1, 2, 4, 8 bytes; fixed or float) ✤ Text (255 characters max) ✤ Memo (64000 max) ✤ Date/Time (8 bytes) ✤ Currency (8 byte, 15-digits + 4 decimals) ✤ Autonumber (4 bytes) ✤ Yes/No (1 bit) ✤ Hyperlinks (64000 characters max) ✤ Byte (0-255) ✤ Integer (-32,768 to 32,768) ✤ Long Integer, String, Double, etc. 15 Monday, March 14, 2011 Data Types (MySQL) 16 Monday, March 14, 2011 Physical Requirement Issues ✤ A physical record is a group of fields stored in adjacent memory locations and retrieved together ✤ Combination of fixed length & variable length fields ✤ How are data stored? ✤ On Disks ✤ In a buffer ✤ With other data representing the relational data on the disk ✤ Processor cache, Disk speed, RAM, other storage devices all add up to the length of time it takes to retrieve data 17 Monday, March 14, 2011 By the way ... • Null values cause trouble because different versions of SQL treat nulls differently; also null values affect sums, counts, etc. • Use “is null” or “is not null” SELECT * FROM myTable WHERE fname is null; 18 Monday, March 14, 2011 By the way... ✤ Recall we need to be aware of nulls! What would happen to our data and model if there were null values? ✤ When is there an advantage, if there is, to fixed length and variable length fields? ✤ Should the RDBMS system assign values to sequences (e.g., auto number)? ✤ 3NF vs. BCNF ✤ All decisions about data are affected by the work flow and organization’s info needs ✤ including the decision to break the rules and “denormalize” the data 19 Monday, March 14, 2011 Records on the disk ... ✤ Consist of variable and fixed length - but how to know where the data are? ✤ Draw on the board how fixed, variable length, and record references appear ✤ header: base address (B); location of data is Address = B + length 1 + length 2… ✤ Record Header: pointer to the schema, to the length of the record, and to the timestamp … then the data ✤ Fixed fields first; then pointer from header to variable length - see MARC ✤ “Reference fields” are pointers to -other- chunks of data located on the disk. They represent the 1:M and M:N relationships. ✤ Physical space on disk: record may require more than 1 block of disk space so need a pointer to the next block’s location on the disk 20 Monday, March 14, 2011 BLOBs; Disk Space ✤ Binary large objects - images, sounds, etc. ✤ RDBMS tries to store these items in contiguous blocks ✤ Note: the commands affect disk space - the data are shifted or new blocks are required; ✤ The host, disk, cylinder number, track number, block within the track and offset block are required to be stored, too. ✤ Consequently, there are many techniques: we don’t have to manipulate them but it’s useful to know about them. 21 Monday, March 14, 2011 Access Methods comparison Factor Sequence Indexed Storage space no wasted space No waste but need index data Hasted More space needed for add/del Sequential on primary key very fast moderately fast impractical Random Retr impractical moderately fast very fast Multikey Ok but needs full scan Very fast with multi index not possible Deleting Can waste space Ok, if dynamic very easy Adding Requires rewriting ok, if dynamic very easy Updating Usually rewrite Easy but requires index maint very easy 22 Monday, March 14, 2011 In short... 1. What storage and media are used? 2. How big is the database? How will it grow over time? 3. What are the required access speeds? 4. Should data be partitioned somehow? 5. Should the data be stored centrally or distributed? On what servers? 6. Who is responsible for maintaining the physical data (the computers) and the data (indices and other needs)? 7. Who controls quality assurance and quality control on updates and additions to the data, the programs? 8. How does your documentation look? Would someone else be able to follow your analysis, documentation, data design, etc.? 23 Monday, March 14, 2011 And ... 1. What programs (applications) can reach your data? 2. Is the integrity of the data (referential, null values, domain and range) addressed? 1. Where will be the quality control? 1. Data control on insertion (e.g., web forms & JavaScript or on the server in the program?) 2. Formatting data on output (have you checked for nulls, numbers, and Strings? etc.?) 3. What are the permissions (grant rights) on your database and tables? 24 Monday, March 14, 2011 Data Retrieval ✤ ✤ SELECT Statements ✤ Used to retrieve data from the RDBMS in an ad-hoc manner ✤ Data are returned almost always in a table (rows of data described by columns). Programming languages have optimized ways of getting data out of these tables: ✤ Java uses “ResultSet” ✤ PHP uses $ There’s a logic to how commands are structured - but it’s always best to check the documentation of your version of SQL! 25 Monday, March 14, 2011 SELECT syntax SELECT is a list of at least one column DISTINCT suppresses duplicates * selects all (*) columns COLUMN selected named column(s) Alias Gives selected columns a heading FROM table Specifies the source table(s) Condition e.g., WHERE column names, expressions, constants and comparison ORDERED BY specifies the display order ASC in ascending order DESC in descending order (default) 26 Monday, March 14, 2011 Select example ✤ SELECT bookCol.callNo, bookCol.loanedTo, userGroups.userID FROM ✤ bookCol, userGroups ✤ WHERE ✤ userGroups.userID = ‘100’ AND ✤ bookCol.loanedTo = userGroups.userID 27 Monday, March 14, 2011 Some options on rows ✤ ✤ On individual rows: ✤ LOWER, UPPER, CONCAT, SUBSTRING, LENGTH (on strings) ✤ ROUND, TUNC, MOD (on numbers) ✤ MONTHS_BETWEEN, ADD_MONTHS, NEXT_DAY, LAST_DAY, ROUND TRUNC (on dates) ✤ TO_CHAR, TO_DATE, and others (conversion functions) On multiple rows (GROUP BY, HAVING clause) ✤ AVG ✤ COUNT ✤ MIN, MAX ✤ SUM, STDDEV, VARIANCE 28 Monday, March 14, 2011 DDL ✤ CREATE, ALTER, DROP, RENAME, TRUNCATE ✤ CREATE VIEW theData AS SELECT … ✤ FROM … ✤ WHERE ... 29 Monday, March 14, 2011 DCL ✤ Some say Data Control Language, some say Data Creation Language ✤ GRANT, REVOKE ✤ Transaction Control: ✤ COMMIT, ROLLBACK, SAVEPOINT 30 Monday, March 14, 2011 DML ✤ INSERT, UPDATE, DELETE ✤ INSERT INTO table [(column [, column…])] VALUES (value, [, value…]}]; ✤ UPDATE table SET columnName = value WHERE condition; ✤ DELETE FROM table WHERE condition; 31 Monday, March 14, 2011 OORDBMS ✤ Increasingly popular - much more work for the programmer and designer 32 Monday, March 14, 2011 Documentation ✤ See the MySQL homepage ✤ Worth finding a text and websites whose examples you can understand and apply. ✤ Practice commands using terminal window or phpMyAdmin and save the commands that work (with comments) in a text file for your own use. This is extremely useful for documentation and remembering what to do on your next project! ✤ Example: Practice-DB-SQL.pdf ✤ http://web.simmons.edu/~benoit/LIS458/Practice-DB-SQL.pdf 33 Monday, March 14, 2011 Students... ✤ Who wants to volunteer to issue commands on what they’ve created? ✤ Examples: Using Perl, PHP and Java as part of web-enabled RDBMS. ✤ First Java, Perl, then PHP – note: the purpose is not to master writing the code but to see the parallels of connecting to the SQL server, creating a bridge to send/receive data, an object to capture the data, and then how the data are extracted row by row or element by element and then wrapped in HTML (or XML) to be returned to the user. 34 Monday, March 14, 2011 Example using a Java program import import import import import import import java.io.*; java.sql.*; javax.servlet.*; javax.servlet.http.*; javax.sql.*; java.util.*; java.math.*; 35 Monday, March 14, 2011 public void doPost( HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { … theServer != req.getServerName(); id = req.getParameter("idno"); … res.setContentType("text/html"); sos = res.getOutputStream(); 36 Monday, March 14, 2011 Connection cu = null; Statement su = null; ResultSet ru = null; String driver = "org.gjt.mm.mysql.Driver";! // LINUX // driver = "com.mysql.jdbc.Driver";! // MAC try { Class.forName( driver ).newInstance(); cu = DriverManager.getConnection( "jdbc:mysql://" + theServer +":3306/myDB”, “gb”, “cat”); // myDB = database; gb = db use name; cat = password su = cu.createStatement(); su.executeUpdate( “DELETE FROM users WHERE idno=’” + id +”’”); Monday, March 14, 2011 Example of retrieving data try { Class.forName( driver ).newInstance(); con = DriverManager.getConnection("jdbc:mysql://" + theServer +":3306/"+dbName, dbUser, dbPassword ); stmt = con.createStatement(); rs = stmt.executeQuery( reviewQuery ); while (rs.next()) { sos.println(“Welcome, “+rs.getString(“first_name”)); 38 Monday, March 14, 2011 } catch (Exception e) { if (e instanceof SQLException) { SQLException sqlex = (SQLException)e; sos.println("SQL state = "+sqlex.getSQLState()); sos.println("<br/>Error message = "+e.getMessage() + sqlex.getErrorCode()); if ((sqlex.getErrorCode()) == 1045) { sos.println("<hr>Sorry, the connection has been refused by the database server."); } } } finally { ! if (con != null) { ! try { ! ! con.close(); stmt.close(); rs.close(); ! ! } catch (Exception ee) { ! } } 39 Monday, March 14, 2011 public boolean checkID(String id, String password, ! String tableName, String idfield, String dbUser, String dbPassword, String dbName,String theServer) { ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! } String checkID = "SELECT * FROM "+tableName+ " WHERE idfield+"='"+id+"' AND password='"+password+"'"; boolean returnValue = false; Connection con = null; ! Statement stmt = null; ! ResultSet rs = null; try { ! Class.forName( driver ).newInstance(); ! con = DriverManager.getConnection("jdbc:mysql://" + theServer +":3306/" + dbName, dbUser, dbPassword ); ! stmt = con.createStatement(); ! rs = stmt.executeQuery( checkID ); ! if (rs.next()) { ! ! returnValue = true; ! } else { ! ! returnValue = false; ! } } catch (Exception e) { ! if (e instanceof SQLException) { ! ! SQLException sqlex = (SQLException)e; ! ! errorStatus = "SQL state = "+sqlex.getSQLState()+" Error message = "+sqlex.getMessage(); ! ! errorStatus += "<br/>Database status: state code "+sqlex.getSQLState(); ! ! if ((sqlex.getErrorCode()) == 1045) { ! ! ! errorStatus += "<hr>Sorry, the connection has been refused by the database server."; ! ! } ! } else { ! ! errorStatus += "<blockquote>Error message "+e.getMessage() +"</blockquote>"; ! } } finally { ! if (con != null) { ! ! try { ! ! ! con.close(); ! ! ! rs.close(); stmt.close(); ! ! } catch (Exception ee) { ! ! } ! } } return returnValue; 40 Monday, March 14, 2011 Perl example ✤ NEAR THE TOP OF YOUR SCRIPT ADD THIS CODE USE MYSQL; $DBHOST = "LOCALHOST"; $DBNAME = "MYDATABASE"; $DBUSER = "PERLSCRIPTS"; $DBPASS = "YWE6YWNQ"; $DB = MYSQL->CONNECT($DBHOST, $DBNAME, $DBUSER, $DBPASS); $DB = MYSQL->CONNECT($DBHOST, $DBNAME, $DBUSER, $DBPASS); $QRY = QQ~SELECT * FROM EMPLOYEES WHERE ID < 100~; WHILE( @EMPS = $QRY->FETCHROW) { PRINT QQ~ $EMPS[0], $EMPS[1], $EMPS[2] <BR> ~; } THE CODE ABOVE, WHEN TRANSLATED INTO ENGLISH, SAYS "CONNECT TO THE SERVER, SELECT ALL COLUMNS FROM THE TABLE NAMED EMPLOYEES WHERE ID IS LESS THAN 100, THEN WHILE THE DATA IS PLACED INTO AN ARRAY CALLED EMPS USING THE FETCHROW METHOD, PRINT COLUMNS 1, 2 AND 3 THEN A LINE BREAK." 41 Monday, March 14, 2011 PHP Examples <?php $username = "pee_wee"; $password = "let_me_in"; $hostname = "localhost";! $dbh = mysql_connect($hostname, $username, $password) ! or die("Unable to connect to MySQL"); print "Connected to MySQL<br>"; $selected = mysql_select_db("first_test",$dbh) ! or die("Could not select first_test"); // you're going to do lots more here soon mysql_close($dbh); ?> See also 488 notes on PHP and UsingPHPandMySQL.txt 42 Monday, March 14, 2011 1 of 4 #!/USR/LOCAL/RBIN/PERL USE DBI; PRINT <<END; CONTENT-TYPE: TEXT/HTML <HTML><HEAD> <TITLE>EXAMPLE OF PERL CALLING MYSQL</TITLE> </HEAD><BODY BGCOLOR="WHITE"> END 43 Monday, March 14, 2011 2 of 4 # DATABASE INFORMATION $DB="FIFISDATABASE"; $HOST="GSLIS.SIMMONS.EDU"; $USERID="SCOTT"; $PASSWD="TIGER"; $CONNECTIONINFO="DBI:MYSQL:$DB;$HOST"; # MAKE CONNECTION TO DATABASE $DBH = DBI->CONNECT( $CONNECTIONINFO,$USERID,$PASSWD); 44 Monday, March 14, 2011 3 of 4 # PREPARE AND EXECUTE QUERY $QUERY = "SELECT * FROM PEOPLE WHERE AGE > 30 ORDER BY NAME"; $STH = $DBH->PREPARE($QUERY); $STH->EXECUTE(); # ASSIGN FIELDS TO VARIABLES $STH->BIND_COLUMNS(\$ID, \$NAME, \$AGE); # OUTPUT NAME LIST TO THE BROWSER PRINT "NAMES IN THE PEOPLE DATABASE:<P>\N"; PRINT "<TABLE>\N"; WHILE($STH->FETCH()) { PRINT "<TR><TD>$NAME<TD>$AGE\N"; } 45 Monday, March 14, 2011 4 of 4 PRINT "</TABLE>PRINT "</BODY>\N"; PRINT "</HTML>\N"; $STH->FINISH(); # DISCONNECT FROM DB $dbh->disconnect; database 46 Monday, March 14, 2011 By the way, 3 ✤ ✤ ✤ ✤ ✤ ✤ A few things from practice: query blocks SELECT DISTINCT * FROM cats WHERE cat.name IN (SELECT pets.name FROM pets) is the same as SELECT DISTINCT cats.* FROM cats, pets WHERE cats.name=pets.name BUT note the nested select statement: this is often a solution but isn’t considered a best practice. ✤ SELECT * FROM cats WHERE cats.name IN (SELECT DISTINCT pets.name FROM pets) COUNT() may not always work correctly. Some folk recommend avoiding DISTINCT if duplicates are acceptable or if the answer set contains a key Minimize the use of GROUP BY and HAVING, e.g., ✤ SELECT MIN(pets.age) FROM pets GROUP BY pets.idno HAVING pets.idno=‘100’ ✤ SELECT MIN(age) FROM staff WHERE staff.idno=‘100’; 47 Monday, March 14, 2011 Next steps... ✤ Ensure your documentation is on your website ✤ Finalize your SQL statements that create the data views you want ✤ Post the statements on your page (after you’ve tested ‘em, of course!) ✤ Next class we take your practiced statements and create “Prepared Statement” objects out of them and integrate them into a web-enabled rdbms. ✤ Comments on your work to be emailed individually shortly. 48 Monday, March 14, 2011