Relational Database Systems 2
Transcription
Relational Database Systems 2
Relational Database Systems 2 3. Indexing and Access Paths Christoph Lofi Benjamin Köhncke Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 2 Storage • Each DBMS uses several types of storage – Storage hierarchy – Predominant storage type is the hard disk • Hard disks – Slow random access – Fast sequential transfer – Unreliable & failure prone Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme 2 3 Indexing and Access Paths 3.1 Introduction to access paths 3.2 Files and blocks 3.3 Indexing – Single-level indexes – Multi-level indexes – Hash indexes 3.4 Physical Tuning Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 3 3.1 Introduction to Access Paths • Databases persistently store data • Major problem: efficiency of data access – Obviously depends on the hardware used • But also depends to a large degree – On the allocation of data on the disks – On intelligent buffer management – On creating access paths and indexing data • Physical tuning of the database is a main task of database administrators Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 4 3.1 Introduction to Access Paths • For persistent storage databases are mapped into a number of files – Located in specially protected parts of the file system (tablespaces, etc.) – Actually maintained by the operating system • Each file is partitioned into fixed length blocks (pages) – – – – SKS 10.5 Smallest unit transferred from/to storage Block size is specified on DB creation Is a multiple of the OS block size A block usually contains several data records Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 5 3.1 Sectors and Blocks • Harddisk Sectors are abstracted by the file system to blocks • DBMS abstracts FS blocks to DBMS blocks DISK FS Block 1 DBMS Block 1 FS Block 2 DBMS Block 2 FS Block 3 DBMS Block 3 FS Block 4 DBMS Block 4 FS Block 5 DBMS Block 5 FS Block 6 DBMS Block 6 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 6 3.1 Buffer Management • For data access always complete blocks (pages) are transferred from disk into the main memory – The part used for holding copies of blocks is referred to as DB buffer and managed by the buffer manager – Optimized (pre-)fetching of blocks greatly improves performance • If a data record is requested by the DB – If the block is buffered, return main memory address – Else allocate space in the buffer, fetch the block from disk, and return main memory address SKS 10.5 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 7 3.1 Buffer Management • Once new blocks are fetched, generally currently buffered blocks have to be evicted – If blocks have been modified, they must be written back to disk (which is not always possible) – Writing of blocks depends on the recovery strategy: • Blocks that cannot yet be written are called pinned blocks • Before checkpoints there might be a forced output of blocks • Several buffer replacement strategies can be applied SKS 10.5 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 8 3.1 Replacement Strategies • Least recently used (LRU): the „oldest‟ block is replaced – Usually used in OS buffer management – Simple, yet effective strategy – Does not use semantics of the data • Toss immediate: After the DB has finished to process a block, the block is immediately replaced – Example: block-nested loop join between tables R, S • Take one block from R and compare against all blocks in S • The block from R can be thrown out, but no block from S SKS 10.5 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 9 3.1 Replacement Strategies • Expected Reuse: statistics about query frequencies assess how useful a buffered block is – The block with least expected usefulness is evicted – Example: index blocks are more often addressed than data blocks • In any case, whatever the strategy: Once a block has been modified and has not yet been written to disk, it cannot be evicted! – Note: recovery subsystem has to agree before writing a block to disk SKS 10.5 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 10 3.2 Blocks (Pages) • Overhead & Payload (e.g., ORACLE data blocks) – Header contains general block information like block address, type of segment (data, index,…) – Table directory contains tables that have rows in the block – Row directory contains information about the actual rows in the block (IDs, addresses,…) – Row data is actual data Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 11 3.2 Extents and Segments • An Extent is a logical collection of blocks (usually adjacent) – Fixed size, once more data space is needed for rows a new extent is allocated – Remember: reading adjacent blocks improves access time • Segments are collections of logically connected extents – For instance tables, index segments, or rollback segments Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 12 3.2 Tablespaces • A tablespace is the logical storage space needed for the data in a table – A grouping of multiple files allocated on disk • Example: Data1.ora, system.ora, test.dbf – Good practice to have one tablespace for tables, and a different one for indexes CREATE TABLESPACE user_data DATAFILE ‘udata.ora’ SIZE 10M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 13 3.2 File Organization • Records of different relations should be stored in individual files – Storing related records in the same block minimizes disk accesses • Multitable clustering file organization may store records of different tables in the same block – Good e.g. for some often occurring joins Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 14 3.2 File Organization • Files reserve space in the file system – File size can be specified or change dynamically – Default values strongly differ (e.g., DB2 allocates 100 MB) Files related to table “content_type_lecture” Schema file Data file with data blocks Index file with index blocks Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 15 3.2 File Organization • Organization of records in the file – Heap – a record can be placed anywhere in the file where there is space – Sequential – store records in sequential order, based on the value of the search key of each record – Hashing – a hash function is computed on some attribute of each record. The result specifies in which block of the file the record should be placed Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 16 3.2 File Organization • Data records have to be written in a file such that the entire record can be accessed with minimal disk accesses – Fixed length records (easy to implement) – Variable length records (storage space efficient) TYPE deposit = RECORD name : CHAR(22); accNo : CHAR(10); balance : REAL; END – If each char is 1 byte and a real 8 bytes, a record takes 40 bytes SKS 10.6 record Name AccNo 0 Burg 864442 654,55 1 Myers 967531 56,45 2 Smith 145288 457,75 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig Balance 17 3.2 File Organization • Fixed length records – 40 bytes for the first record, 40 bytes for the second,… – Problem: if the block size is not a multiple of 40, records may cross boundaries – Problem: deleting a record can be either done by marking it as deleted, or by replacing it with some other record of the file • Keeping deleted items? Reading slow! • Move up all other items? Efficiency! • Fill space with next inserted item? Changes sequence! • Pointer lists? Wastes space in each record! SKS 10.6 record Name AccNo 0 Burg 864442 654,55 1 Myers 967531 56,45 2 Smith 145288 457,75 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig Balance 18 3.2 File Organization • Variable length records – Necessary for multi-table files or records that allow variable length attributes – Problem: How to know when a record ends? • Introduce ‘end-of-record’ symbols or store record length at beginning of the record. But what if a record is updated? • Slotted page structure: header contains number of entries, end of free space and pointers to location/size of entries. Records are moved to use up space: no fragmentation! SKS 10.6 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 19 3.2 File Organization • Typical database records like in our banking example easily fit into one block – Name, account number, balance,… – Usually multiple records per block • Unspanned record organization fills blocks only with i complete records i+1 • Spanned organization uses pointers to divide records i i+1 EN 4.4 record 1 record 2 record 4 record 5 record 1 record 2 4 record 3 record 3 4 p record 5 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 20 3.2 Non-Standard Blocks • Necessary for large objects: binary large objects (blobs) and character large objects (clobs) • Large objects are not interpreted in databases – Text documents, images, audio and video data – Need to be stored in a contiguous sequence of bytes – If an object is bigger than a block, contiguous pages of the buffer pool have to be allocated for storage – Sometimes preferable to disallow direct access, but only allow access through file-system-like API to allow for fragmentation 21 3.3 Indexing • Indexes help to locate records in a DB file – Creation of indexes is part of the physical tuning task of database administrators – Indexes often influence the actual location of storage for a record • Example: sequential storage, storage via a hash function • If the location is determined by the index – Not all attributes can be directly indexed (but secondary access paths may be used) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 22 3.3 Basic Concepts • When items have to be found quickly, indexing mechanisms are used to speed up access – Alphabetical author catalog in libraries – Lexicographic ordering –… Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 23 3.3 Basic Concepts • Ordered by indexing field (search key) – Attribute(s) used to look up records in a file • An index file consists of index entries – Records of the form search key record location(s) • Two basic kinds of queries – Exact matches • Locating a record(s) with a certain value – Range queries • Locating all records in a certain value range Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 24 3.3 Basic Concepts • Two basic kinds of indexes – Ordered indexes: search keys are stored in sorted order • Single-level ordered indexes • Multi-level ordered indexes – Hash indexes: search keys are distributed uniformly across blocks using some hash function • Hash function distributes records uniformly over blocks • Hashed value of search key decides for storage block Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 25 3.3 Single-Level Ordered Indexes • Index links indexing fields to logical file/block locations – Dense indexes index all items, sparse indexes only selected ones – Much smaller file than the data file • Hence, a binary search on the index file requires fewer block accesses than a binary search on the data file – Works exactly like a book‟s index • Only few pages at beginning/end, alphabetically ordered, references to respective page(s) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 26 3.3 Single-Level Ordered Indexes • Primary indexes – Order data by some usually unique attribute as indexing field (primary key), store database records in this order – Index record contains pointer to the respective storage place (block address) 1, Adams, $ 887,00 AccNo 1 AccNo 4 • To save entries usually there is only a single index entry for each block (block anchor) AccNo 7 – But sparse indexes do not directly show, whether some record is in the DB 2, Bertram, $19,99 3, Behaim, $ 167,00 4, Cesar, $ 1866,00 5, Miller, $179,99 6, Naders, $ 682,56 7, Ruth, $ 8642,78 8, Smith, $675,99 9, Tarrens, $ 99,00 EN 14.1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 27 3.3 Single-Level Ordered Indexes – Advantages • Number of blocks needed for storing the index is small compared to data – Not all records are indexed (non-dense) – Index entries are smaller than data records • Can often be kept in buffer – Disadvantages • Insertions and Deletions need to move data in storage and to update index entries affected by the shifts Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 28 3.3 Single-Level Ordered Indexes • Clustering indexes store data records in the order of a non-unique indexing field – One entry in the index per distinct value of the indexing field – Search keys are linked to the block address of the containing the search key – Is a sparse index • But existence of records with a certain key can be assessed by index look-up Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 29 3.3 Single-Level Ordered Indexes • Still problems with insertion/deletion Adams – Often one or more complete blocks are reserved for each Miller single search key to allow for Tarrans block anchors – Blocks do not have to be adjacent, but can use block pointers, if more space is needed • Structurally very similar to a hash index Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 1, Adams, $ 887,00 2, Adams, $19,99 4, Miller, $ 1866,00 5, Miller, $179,99 6, Miller, $ 682,56 7, Miller, $ 8642,78 8, Tarrens, $ 856,78 30 3.3 Single-Level Ordered Indexes • Secondary indexes point to locations of records regarding a non-ordering attribute – Indexing does not affect the storage order – There can be multiple secondary indexes for the same DB file • Secondary indexes are usually dense – Objects with same or adjacent values are usually not adjacent on disk – If the indexing field has unique values (secondary key) definitely all records have to be indexed Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 31 3.3 Single-Level Ordered Indexes • If the indexing field is not unique there are several possibilities to create a secondary index – Create a dense index by including duplicate search keys (one for each record) – Use variable-length index entries, where each search key is assigned a list of pointers – Keep fixed-length index entries, but point to a block containing (multiple) pointers to the actual records • Introduces a level of indirection, but allows for a sparse index • Usually used in practice Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 32 3.3 Single-Level Ordered Indexes • Dense secondary index or sparse with indirection Adams Cesar Miller 3, Miller, $19,99 5, Miller, $179,99 Post 6, Tarrens, $ 682,56 Snyder 7, Post, $ 8642,78 8, Adams, $ 856,78 9, Snyder, $ 856,78 Tarrens Cesar p 1, Cesar, $ 887,00 2, Tarrens, $19,99 Miller p 3, Miller, $19,99 p 4, Orwell, $ 1866,00 Orwell Tarrens p 1, Cesar, $ 887,00 2, Tarrens, $19,99 Miller Adams 4, Orwell, $ 1866,00 Orwell p 5, Miller, $ 179,99 Post p 6, Tarrens, $ 682,56 Snyder p 7, Post, $ 8642,78 Tarrens 8, Adams, $ 856,78 p 9, Snyder, $ 856,78 p Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 33 3.3 Single-Level Ordered Indexes • Characteristics of secondary indexes – Speeds up retrieval, because if it does not exist, the entire file would have to be scanned linearly • For non-existent primary indexes files can still be scanned in a binary search fashion – Uses more search time and space, because it is dense – Secondary indexes provides a logical ordering • Accessing records in that order might not be the most efficient way regarding block accesses • Each record access may fetch a new block into the buffer Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 34 3.3 Single-Level Ordered Indexes • Improving secondary indexes for range queries – Same block may be (unnecesarrily) accessed multiple times • Simple Example: Cesar-Orwell (buffer holds only 1 block) Cesar: fetch first block Miller: evict first block, fetch second block Miller: evict second block, fetch first block Orwell: evict first block, fetch second block Adams Cesar 1, Cesar, $ 887,00 2, Tarrens, $19,99 Miller Miller 3, Miller, $19,99 4, Orwell, $ 1866,00 Orwell 5, Miller, $179,99 Post 6, Tarrens, $ 682,56 Snyder 7, Post, $ 8642,78 Tarrens 8, Adams, $ 856,78 9, Snyder, $ 856,78 Tarrens Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 35 3.3 Single-Level Ordered Indexes • Improving secondary indexes for range queries – Remember: working on blocks in main memory is cheap, fetching blocks from disk is expensive! • Improved Example: Cesar-Orwell • Fetch ‘Cesar’ and scan whole block • Output ‘1’ & ‘3’ • Mark block as completed and evict • Fetch ‘Miller’ and scan whole block • Output ‘4’ & ‘5’ • Mark block as completed and evict • Skip second ‘Miller’ • Skip ‘Orwell’ Adams 1, Cesar, $ 887,00 Cesar 2, Tarrens, $19,99 Miller 3, Miller, $19,99 Miller 4, Orwell, $ 1866,00 Orwell 5, Miller, $179,99 6, Tarrens, $ 682,56 Post 7, Post, $ 8642,78 Snyder Tarrens Tarrens Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 8, Adams, $ 856,78 9, Snyder, $ 856,78 Done 36 3.3 Single-Level Ordered Indexes • Short Summary – Primary and cluster indexes affect storage order, secondary indexes don‟t – Primary and clustering indexes may be sparse, secondary indexes are usually dense • Primary and clustering indexes can use block anchors – Index sizes • Primary index • Clustering index • Secondary index – – – number of blocks number of distinct search keys up to number of records in table Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 37 3.3 Multi-Level Ordered Indexes • Example: What if a primary index does not fit in main memory? – Bad look-up efficiency! • Simple multi-level solution – Treat primary index on disk as a sequential file and construct a sparse index on it • outer index – sparse index of primary index • inner index – primary index file – If even outer index is too large to fit in main memory, yet another level of index can be created, and so on Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 38 3.3 Multi-Level Ordered Indexes • Multi-level indexes can further improve search speed – In single level indexes a binary search can be applied to locate pointers to blocks • Efficiency of log2N – Multi-level indexes allow for higher search efficiency • Fan-out of the index should be higher than 2 • Efficiency of logfan-outN EN 14.2 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 39 3.3 Multi-Level Ordered Indexes • Concept of Multi-Level Indexes can be applied to primary, clustering and secondary indexes as long as values are distinct 1, Adams, $ 887,00 • Example: 2-Level Primary 1 2, Bertram, $19,99 Index 3 – Level 1 is Primary Index – Level 2 is Primary Index of level 1 – Efficiency of logfan-outN • fan-out: Number of 1st level blocks per 2nd level block 3, Behaim, $ 167,00 5 5, Miller, $179,99 1 7 7 9 11 Level 2 4, Cesar, $ 1866,00 6, Naders, $ 682,56 7, Ruth, $ 8642,78 8, Smith, $675,99 11 9, Tarrens, $ 99,00 10, Arnold, $ 3442,76 11, Marks, $435,19 Level 1 EN 14.2 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 12, Black, $ 319,44 40 3.3 Multi-Level Ordered Indexes • There is a huge amount of different tree structures for multi-level indexing – Classical structures B-tree, B*-tree, etc. – Multidimensional structures R-trees, Quad-trees, etc. – Lots of literature Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 41 3.3 Hash Indexes • Index Structure based on Hashing – Idealized Access Speed: O(1) • Basic Idea: – Fixed-Size directly addressable Index Space [0..M] containing M+1 buckets • Buckets contain links to data • Single link in internal hashing (i.e. in memory) • Multiple links for external hashing (i.e. on harddisk) – Hash Function h: ℤ → [0..M] • Maps any number to [0..M] • Should be uniform (i.e.: same probability for any x ∈ [0..M]) – Especially, should also be surjective (i.e. use the full range of [0..M]) • Needs to be deterministic (i.e.: same input → same output) • Should be simple and fast EN 13.8.1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 42 3.3 Hash Indexes • Basic Idea – Convert value which is to be indexed to numeric representation – Hash the value – Store block anchor of value in the bucket with computed hash index • Example: M=8 0 1 9, Snyder, $ 856.78 2 Miller h(Miller) = 4 3 Cesar h(Cesar) = 7 4 Snyder H(Snyder) = 1 5, Miller, $179.99 5 6 7 1, Cesar, $ 887.00 8 EN 13.8.1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 43 3.3 Hash Indexes • Problem: Collision – Hash functions are not injective – collision may happen • Block pointers for full blocks (overflow list) – Probability of collision increases with load factor • Deteriorates to linear behavior in worst case 0 Rogers Miller h(Rogers) = 1 1 2 h(Miller) = 4 3 Cesar h(Cesar) = 1 4 Snyder h(Snyder) = 1 3, Rogers, $ 1866.58 1, Cesar, $ 887.00 9, Snyder, $ 856.78 7, Smith, $ 0.29 5 6 Smith h(Smith) = 1 7 5, Miller, $179.99 8 EN 13.8.1 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 44 3.3 Hash Indexes • Problem: Static Address Range – Addressable range is fixed to [0..M] – What happens if address range is exhausted? • E.g., load factor too high, too many collisions occur – Possible solutions • Rehashing : – Create a new larger hashmap and rehash all values – For example, java hashmap follow that policy • Extendible Hashing: – Uses dynamic directory to double and half number of buckets • Linear Hashing: – Buckets are split into two one after another and are individually rehashed with an additional hash-function – Not in this lecture EN 13.8.3 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 45 3.3 Hash Indexes • Extendible Hashing – Hash values are positive integers between [0..2n] • Numbers can be represented in binary • Uses a so-called trie structure • Choose large n – Create a directory of depth d with 2d entries for the first d bits of values • d is global depth of directory • Directory cells link to buckets containing links to data with hash value starting with given bit pattern – Adjacent cells may link to the same bucket depending on local depth of bucket EN 13.8.3 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 46 3.3 Hash Indexes • Extendible Hashing (Bucket size 3) 000 0_0010 0_0100 0_1010 Local depth = 1 All hash values starting with 0 001 010 011 100 101 110 111 Global Depth = 3 EN 13.8.3 10_000 10_101 10_110 110_000 110_010 110_100 111_001 111_010 111_111 Local depth = 2 All hash values starting with 10 Local depth = 3 All hash values starting with 110 Local depth = 3 All hash values starting with 111 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 47 3.3 Hash Indexes • Insert 010000 000 Split this bucket, add 010000 0_0010 0_0100 0_1010 Local depth = 1 All hash values starting with 0 001 010 011 100 101 110 111 Global Depth = 3 EN 13.8.3 10_000 10_101 10_110 110_000 110_010 110_100 111_001 111_010 111_111 Local depth = 2 All hash values starting with 10 Local depth = 3 All hash values starting with 110 Local depth = 3 All hash values starting with 111 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 48 3.3 Hash Indexes 00_010 00_100 000 01_000 01_010 Local depth = 2 All hash values starting with 00 Local depth = 2 All hash values starting with 01 001 010 011 100 101 110 111 Global Depth = 3 EN 13.8.3 10_000 10_101 10_110 110_000 110_010 110_100 111_001 111_010 111_111 Local depth = 2 All hash values starting with 10 Local depth = 3 All hash values starting with 110 Local depth = 3 All hash values starting with 111 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 49 3.3 Hash Indexes • Extendible Hashing – Allows to dynamically split and coalesce buckets as database grows and shrinks • If bucket is full and local depth is lower than global depth: – More than one cell links to this bucket – Bucket is split into two with increased local depth and values are distributed accordingly – Links in cells are adjusted accordingly • If local depth already equals global depth: – Global depth is increased by one and thus the size of the directory is doubled – Each cell is replaced by two cells, both of which contain the same pointer as the original cell EN 13.8.3 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 50 3.3 Hash Indexes • Summary Hash Indexes: May perform in O(1) – But: Management overhead can be quite large for growing data collections • Overflow lists have small management overhead, but may decrease access performance to O(n) • Rehashing is very expensive for each growth stage • Extendible hashing has a smaller overhead and is especially suitable for external storage hashing EN 13.8 Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 51 3.3 Indexing • Why not simply index every attribute? – Physical index on primary key, logical indexes on every other attribute • Results in good read efficiency • But… whenever a DB file is modified, every index on the file has to be updated – Updating indices imposes overhead on database modification (insert, delete, update) • Especially for multi-level indexes all levels may have to be updated Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 52 3.4 Physical Tuning • One of the tasks of the database administrator • Black Magic… – Get DB performance statistics – Try something that seems sensible – Get new performance statistics – If it did not improve, reset it to previous configuration – Try something different until satisfied... Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 53 3.4 Physical Tuning • Black Magic today needs advanced tools to operate correctly • For database tuning, you need – Snapshot Monitors • Continuously collect cumulative statistics for the DB within a given time span – Event Monitors • Hook into certain events and analyze them Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 54 3.4 Diagnostic Database Tools • Snapshot Monitors – Monitor collects cumulative statistics – Internal counters set at global or session levels – Collected statistics can be evaluated afterwards for system tuning on general level – Data collection is explicitly enabled or disabled Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 55 3.4 Diagnostic Database Tools • Useful for collecting point-in-time information regarding overall DB/DBMS behavior – Locking –lists all locks & types held by all applications – Bufferpools – cumulative stats for memory, physical & logical I/O‟s, synch/asynch – Sorts – provides detailed statistics regard sort-heap usage, overflows, # active etc. – Tablespaces – detailed I/O and activity statistics for a given tablespace – UOW – displays status of application Unit of Work at point-in-time – Dynamic SQL – shows contents of package cache and related stats at point-in-time Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 56 3.4 Diagnostic Database Tools • Snapshot Monitors – Overhead continuously is in the 3% - 5% range • But no I/O since counters are usually RAM resident, not written to disk – Quick list of useful metrics to identify particular problems include: • • • • • • bufferpool hit ratios sort overflows/problems effectiveness of prefetch need for indexing transaction logging issues single query/application resource domination, etc. Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 57 3.4 Diagnostic Database Tools • Snapshot Monitor: IBM DB2 Performance Expert http://www.redbooks.ibm.com/abstracts/sg246470.html Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 58 3.4 Diagnostic Database Tools • Event Monitors – Collects information regarding events or chains of events • No continuously collected data as with snapshots – Must be explicitly created and activated via DBMS commands or API‟s – Are the best way to effectively diagnose and resolve transaction problems (e.g., deadlock issues) – Output may be directed to a database table and results then analyzed using SQL Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 59 3.4 Diagnostic Database Tools • Event Monitor can be used to – Identify and rank most frequently executed or most time consuming SQL – Track the sequence of statements leading up to a lock timeout or deadlock – Track connections to the database – Breakdown overall SQL statement execution time into sub-operations such as sort time and use or system CPU time Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 60 3.4 Diagnostic Database Tools • Event Monitor: DBI Brother Panther http://www.dbisoftware.com Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 61 3.4 The Black Art of Physical Tuning • Physical DB Tuning is a complicated and intransparent task • Usually trial and error – Measure some hopefully meaningful metrics of you DB usage statistics – Adjust around on something • Mostly indexes and tablespace properties – Measure again • If results better : Great! Continue tuning! • If result worse : Bad! Undo everything you did and try something else. • But what to measure and what to do? – Use best practices or try yourself! Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 62 3.4 Physical Tuning • What to measure? • First, what kind of database you have? – OLTP (Online Transaction Processing) : Database with many small, concurrent read / write queries • Optimize for fast read write accesses • Keep an eye on “real-time” response times • Optimize for concurrency – Warehouse (OLAP – Online Analytical Processing): Using larger and more complex queries for primary retrieving data • Optimize for read accesses Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 63 3.4 Tuning Metrics • There is a huge amount of knobs and screws to turn – DB planning and tuning is for professionals – Special certificates by each database vendor IBM DB2 IBM Certified Database Associate IBM Certified Database Administrator IBM Certified Application Developer DB2 IBM Certified Advanced Database Administrator http://education.oracle.com http://www-03.ibm.com/certify/certs/dm_index.shtml http://www.microsoft.com/learning/mcp/mcdba/ Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 64 3.4 Tuning Metrics • Scott Hayes – President & CEO of Database-Brothers Inc (DBI) • If I only had five minutes to assess the status of a database, I would look at… – Average Result Set Size – Index Read Efficiency – Synchronous Read Percentage – Average Rows Read per DB Transaction per Table – Average Read Time Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 65 3.4 Tuning Metrics • ARSS - Average Result Set Size – ARSS = RowsSelected / NumSelectQueries – Rule of Thumb: • ARSS ≤ 10 indicates a OLTP database – If you think you have OLTP and ARSS>>10, there is something wrong… • ARSS > 10 indicates a warehouse database (OLAP) Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 66 3.4 Tuning Metrics • IREF - Index Read Efficiency – “How many rows have to be read to retrieve one row?” • i.e. How much overhead is within the where predicates? – IREF = RowsRead / RowsReturned • Well-designed indexes may evaluate a where-clause without reading additional rows! – Rule of Thumb: • IREF should be less than 10 for OLTP Databases • May be higher for Warehouses (>100) – What to do? • Introduce more efficient indexes! • User simpler queries! Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 67 3.4 Tuning Metrics • SRP – Synchronous Read Percentage – Synchronous Read means the DBMS reads just the index file and the required data page (good) – Asynchronous Read means the DBMS employs prefetching to scan for an indexes or data (BAD) – Rule of Thumb: • >90% for OLTP (>50% for warehouses) : Good • 80%-90% (25%-50%) : Ok • 50%-80% (<25%) Bad – What to do? • Improve index structures! Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 68 3.4 Tuning Metrics • TBRRTX – Average Rows Read per DB Transaction per Table – TBRRTXtable = rowsRead / (attemptedCommits + attemptedRollBacks) – Rule of Thumb: • • • • <10 for OLTP : Good 10-100 : Ok >100 : Bad >1000 : Horrific – What to do? • Improve index structures! • Improve Queries Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 69 3.4 Tuning Metrics • ART – Average Read Time – The average time per read – RT = ReadTime / (NumDataRead + NumIndexRead) – If you think you did everything within your application and indices, optimize read time • Adjust size of tablespace containers • Distribute containers across disks – Avoid OS paging device • Move highly access tablespaces to faster devices Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig 70 3 Indexes • Buffer Management is very important – Holds DB blocks in primary memory • DB block are made up of several FS blocks • Find good strategies to have requested DB blocks available when needed – Each block holds some meta data and row data • Indexes drastically speed up queries – Less blocks need to be scanned – Primary Index • On primary key attribute, usually influences row storage order – Secondary Index • On any attribute, does not influence storage order Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme 71 Next Lecture • Tree-Index Structures – Binary Tree • Naïve Binary Tree • Balancing Tree – B-Trees • B • B+ • B* Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme 72