Databases: Why? DATABASE DESIGN I - 1DL300 hour
Transcription
Databases: Why? DATABASE DESIGN I - 1DL300 hour
Databases: Why? DATABASE DESIGN I - 1DL300 • An introductury course on database systems http://www.it.uu.se/edu/course/homepage/dbastekn/vt12 • You are using a database every day hour • Database Management Systems (DBMS) provide … Erik Zeitler … efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data. Uppsala Database Laboratory Department of Information Technology, Uppsala University, Uppsala, Sweden Erik Zeitler - UDBL - IT - UU 2012-01-16 Extremely prevalent – Web sites, banking, telecom, sensors, retail, science and engineering, … Spring 2012 1 Erik Zeitler - UDBL - IT - UU 2012-01-16 Databases: What? • Massive • • Persistent Safe • Multi-user • Convenient – Terabytes Databases: How? per day • • – Physical data independence – High-level declarative query language • Reliable • values Data definition language (DDL) – Set up schema – 1000’s of queries and/or updates per second • Data manipulation language (DML)/Query Language (QL) – Query and modify – 99.99999% Erik Zeitler - UDBL - IT - UU Schema vs data types variables – Concurrency control Efficient Data model – Set of records, graph, XML – Hardware software power users • 2 2012-01-16 3 Erik Zeitler - UDBL - IT - UU 2012-01-16 4 Key DB people • 1DL300 in a nutshell 13 LECTURES DBMS implementer – Builds system (works at Oracle, IBM, Microsoft, Uppsala University) • • Course intro – Overview of DB technology – DB terminology DB designer – Establish schema • ER-modeling • Relational model and relational algebra • ER-to-relational mapping and Normalization • SQL • Transactions, Concurrency control • Recovery techniques • Intro to storage and index structures 1DL400 • DB application developer – Programs that operate on DB • 1DL300 DB Administrator (DBA) – Operations & Maintenance – Tuning & Optimization Erik Zeitler - UDBL - IT - UU 2012-01-16 5 Erik Zeitler - UDBL - IT - UU 3 ASSIGNMENTS To be done in pairs: 1. Normalization 2. DB Design and ER modeling 3. SQL in RDBMS Queries and views JDBC API access to RDBMS 1 FINAL EXAM • Skrivsalen Mon March 5 • No books allowed 2012-01-16 6 Friendly reminders from the Student Office Personnel 1. Not admitted? Lecturers Assistants • Erik Zeitler (main teacher) • Mikael Lax 1. Swedish students: Go to www.antagning.se 1DL300 sen anmälan 2. Master students: Go to your study counsellor 3. Exchange students: Ulrika Jaresund 2. Make sure you are registered (registration deadline January 29) • phone 471 7345 • room 1306 – room 1320 • Silvia Stefanova 1. 2. 3. 4. • Lars Melander – room 1319 • phone 471 1051 • room 1316 • Anne Peters • Minpeng Zhu – room 2005 Studentportalen (all) Registered before, want to register again? Student Office [email protected] (re-register) Master students: Study counsellor Exchange students: Ulrika Jaresund 3. Want to quit the course? • phone 471 3155 • room 1310 – Inform the Student Office! ([email protected]) • If less than 3 weeks have passed since the course started, the course registration will be removed. After 3 weeks a "course intermission" will be reported to UPPDOK instead. 4. Don’t forget to sign up for the final exam [email protected] Erik Zeitler - UDBL - IT - UU 2012-01-16 7 Erik Zeitler - UDBL - IT - UU 2012-01-16 8 The database market (Computer Sweden May 24, 2002) Introduction to Database Terminology Elmasri/Navathe chs 1-2 Padron-McCarthy/Risch ch 1 Erik Zeitler Department of Information Technology Uppsala University, Uppsala, Sweden Erik Zeitler - UDBL - IT - UU 2012-01-16 9 Erik Zeitler - UDBL - IT - UU 2012-01-16 10 Outline of a database system DBMS deployment plans (Gartner 2008) Erik Zeitler - UDBL - IT - UU 2012-01-16 11 Erik Zeitler - UDBL - IT - UU 2012-01-16 12 Database? Database management system? • A database (DB) is a more or less well-organized collection of related data. • A database management system (DBMS) is one (or several) program that provides functionality for users to develop, use, and maintain a database. • Thus, a DBMS is a general software system for defining, populating (constructing), manipulating and sharing databases for different types of applications. • Also supports protection (system and security) and maintenance to evolve the system. • The information in a database . . . – represents information within some sub-area of “reality” (i.e. objects, characteristics and relationships between objects) – is logically connected – has been organized for a specific group of users and applications Erik Zeitler - UDBL - IT - UU 2012-01-16 13 Erik Zeitler - UDBL - IT - UU Database system? • 2012-01-16 14 Database vs files A database system consists of . . . – the physical database (instance) – a database management system – one or several database languages (means for communicating with the database) – one or several application program(s) • A database system makes a simple and efficient manipulation of large data sets possible. • The term DB can refer to both the content and to the system (the answer to this ambiguity is governed by the context). Erik Zeitler - UDBL - IT - UU 2012-01-16 • DB in comparison to conventional file management: – – – – – – 15 data model – data abstraction meta-data – in catalog program-data and program-operation independence multiple views of data sharing data – multiuser transactions high-level language for managing data in the database Erik Zeitler - UDBL - IT - UU 2012-01-16 16 Advantages of using a database approach • • • • • • • • • • • Data model? Efficient search and access of large data sets Controlling redundancy and inconsistency Access control Persistent storage Indexes and query processing Backup and recovery Multiple user interfaces Complex relationships Integrity constraints Active behaviour Enforcing standards, reducing application development time, flexibility to evolve system, up-to-date info Erik Zeitler - UDBL - IT - UU 2012-01-16 • Every DB has a data model – “hides” the physical representation of data • A data model is a formalism that defines – a notation for describing data on an abstract level – a set of operations to manipulate data represented using the data model • Data models are used for data abstraction – Enabling definition and manipulation of data on an abstract level. 17 Erik Zeitler - UDBL - IT - UU Data models - examples • • 18 Meta-data, i.e. “data about data” Examples of representational (implementation) data models within the database field are: – – – – – 2012-01-16 • Hierarchical (IMS) Network (IDMS) Relational (ORACLE, DB2, SQL Server, InterBase, Mimer) Object-oriented (ObjectStore, Objectivity, Versant, Poet) Object-relational (Informix, Odapter, DB2) Information about which information that exists and about how/where data is stored – – – – – names and data types of data items names and sizes of files storage details of each file mapping information among schemas constraints cf. variables in a program Conceptual data model • – ER-model - Entity-Relationship model – (not an implementation model since there are no operations defined for the notation) Erik Zeitler - UDBL - IT - UU 2012-01-16 19 Meta-data is stored in a system catalog (alt. term data dictionary). Erik Zeitler - UDBL - IT - UU 2012-01-16 20 Schema and instance • Data independence To be able to separate data in the database and its description the terms • Reduces the connection between: – database instance and – database schema are used. – the actual organization of data and – how the users/application programs process data (or “sees” data.) • The schema is created when a database is defined. A database schema is not changed frequently. • Why? • The data in the database constitute an instance. Every change of data creates a new instance of the database. Erik Zeitler - UDBL - IT - UU 2012-01-16 21 – Data should be able to change without requiring a corresponding alteration of the application programs. – Different applications/users need different “views” of the same data. Erik Zeitler - UDBL - IT - UU 2012-01-16 22 Three-schema architecture Data independence - how? By introducing a multi-level architecture where each level represents one abstraction level End Endusers users • The three-schema architecture – Introduced in 1971 – a.k.a. ANSI/SPARC architecture for databases – CODASYL Data Base Task Group. Conceptual level • It consists of 3 levels: – Internal level – Conceptual level – External level • Internal level Each level introduces one abstraction layer and has a schema that describes how representations should be mapped to the next lower abstraction level. Erik Zeitler - UDBL - IT - UU 2012-01-16 view 2 External level view 1 logical physical … … … view n Conceptual schema Internal schema Database Databaseinstance instance 23 Erik Zeitler - UDBL - IT - UU 2012-01-16 24 Internal, conceptual and external schemas Logical database design • 1. Make an ER diagram 2. Translate that ER diagram to a relational DB schema ename EMPLOYEE WORKS_IN dname – Abstraction level: files, index files etc. – Is usually defined through the data definition language (DDL) of the DBMS. salary 1 N MANAGES sname 1 1 DEPARTMENT SUPPLIER 1 N dno CARRIES SUPPLIES N iname M ITEM ino date • • Schemas for the entity types in the example above EMPLOYEE(ENAME, SALARY, DEPT) saddr DEPARTMENT(DNO, DNAME, MGR) SUPPLIER(SNAME, SADDR) ITEM(INO, INAME, DNO) price ORDER(ONO, DATE, CUST) CUSTOMER(CNAME, CADDR, BALANCE) N ORDER INCLUDE quantity N PLACED_BY CUSTOMER cname balance caddr 1 Erik Zeitler - UDBL - IT - UU Conceptual schema: an abstract description of the physical database. – Constitute one, for all users, common basic model of the logical content of the database. – This abstraction level corresponds to “the real world”: object, characteristics, relationships between objects etc. – The schema is created in the DDL according to a specific data model. • M ono Internal schema: describes storage structures and access paths for the physical database. • Schemas for relationship types (M:N) SUPPLIES(SNAME, INO, PRICE) INCLUDE(ONO, INO, QUANTITY) External schema (or views): a (restricted) view over the conceptual schema – A typical DB has several users with varying needs, demands, access privileges etc. and external schemas describes different views of the conceptual database with respect to what the different user groups would like to/are allowed to se. – Some DBMS’s have a specific language for view definitions (else the DDL is used). 2012-01-16 25 Erik Zeitler - UDBL - IT - UU 2012-01-16 26 An example database (Elmasri/Navathe Figure 1.2) Views - example (Elmasri/Navathe fig 1.4) Erik Zeitler - UDBL - IT - UU 2012-01-16 27 Erik Zeitler - UDBL - IT - UU 2012-01-16 28 Possible data independence in the three-schema architecture Database languages 1. Logical data independence A generic term for a class of languages used for defining, communicating with or manipulating a database. • In conventional programming languages, declarations and program sentences is implemented in one language. • A database language includes several different languages. – The possibility to change the conceptual schema without influencing the external schemas (views). • e.g. add another field to a conceptual schema. 2. Physical data independence – – – – – The possibility to change the internal schema without influencing the conceptual schema.. • the effects of a physical reorganization of the database, such as adding an access path, is eliminated. view1 view2 … … In the DDL the database administrator define internal + conceptual schema • DB users and application programs use DML to → database is designed. Subsequent schema modifications are also made in DDL. viewn – retrieve, add, remove, or alter the data in the database – the term Query Language is usually used as synonym to DML. Internal schema 2012-01-16 29 Erik Zeitler - UDBL - IT - UU 2012-01-16 30 Physical two-tier client-server architecture Classification criteria for DBMSs • • … Conceptual schema Erik Zeitler - UDBL - IT - UU Storage Definition Language (SDL) - internal schema Data Definition Language (DDL) - conceptual schema View Definition Language (VDL) - external schema Data Manipulation Language (DML) Type of data model – hierarchical, network, relational, object-oriented, object-relational • Centralized vs. distributed DBMSs – Homogeneous vs. heterogeneous DDBMSs – Multidatabase systems • Single-user vs. multi-user systems • General-purpose vs. special-purpose DBMSs – specific applications such as airline reservation and phone directory systems. • Cost Erik Zeitler - UDBL - IT - UU 2012-01-16 31 Erik Zeitler - UDBL - IT - UU 2012-01-16 32 Logical three-tier client/server architecture DBMS architecture Erik Zeitler - UDBL - IT - UU 2012-01-16 33 Erik Zeitler - UDBL - IT - UU 2012-01-16 34