Query Processing - Department of Computer Science and Engineering
Transcription
Query Processing - Department of Computer Science and Engineering
CSL 451 Introduction to Database Systems Query Processing Department of Computer Science and Engineering Indian Institute of Technology Ropar Narayanan (CK) Chatapuram Krishnan! Summary • Measures of Query Cost – time to transfer one block – time for one seek • Selection Operation – linear search/file scan – index scan - equality • primary index (A2, A3) • secondary index (A4) – index scan – comparison • primary index (A5) • secondary index (A6) – conjunctive selection (A7, A8, A9) – disjunctive selection (A10) 01/04/15! • Sorting – external sort-merge • Join – – – – nested-loop join block nested-loop join indexed nested-loop join merge-join • sort-merge join • hybrid merge-join – hash-join • hybrid hash-join Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 2! 12.1 Assume that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass of the sort-merge algorithm, when applied to sort the following tuples on the first attribute (Kangaroo, 17) (wallaby, 21) (emu, 1) (wombat, 13) (platypus, 3) (lion, 8) (warthog, 4) (zebra, 11) (meerkat, 6) (hyena, 9) (hornbill, 2) (baboon, 12) 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 3! 12.2 Consider the bank database, where the primary keys are underlined, and the following SQL query: select T.branch_name from branch T, branch S where T.assets > S.assets and S.branch_city=‘Brooklyn’ Write an (efficient) relational algebra expression that is equivalent to the query. branch (branch_name, branch_city, assets) 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 4! 12.6.a Consider the bank database. Suppose that a B+-tree index on branch_city is available on relation branch, and that no other index is available. List different ways to handle the following selections that involve negation σ~(branch_city<‘Brooklyn’)(branch) branch (branch_name, branch_city, assets) 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 5! 12.6.b Consider the bank database. Suppose that a B+-tree index on branch_city is available on relation branch, and that no other index is available. List different ways to handle the following selections that involve negation σ~(branch_city=‘Brooklyn’)(branch) branch (branch_name, branch_city, assets) 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 6! 12.6.c Consider the bank database. Suppose that a B+-tree index on branch_city is available on relation branch, and that no other index is available. List different ways to handle the following selections that involve negation σ~(branch_city<‘Brooklyn’∨assets<5000)(branch) branch (branch_name, branch_city, assets) 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 7! 12.9 What is the effect on the cost of merging runs if the number of buffer blocks per run is increased, while keeping overall memory available for buffering runs fixed? 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 8! 12.10.a Suppose you need to sort a relation of 40GB, with 4KB blocks, using a memory size of 40MB. Suppose the cost of a seek is 5ms, while the disk transfer rate is 40MB/s. Find the cost of sorting the relation, in seconds, with bb=1 and with bb=100. number of block transfers – number of seeks - 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 9! 12.10.b Suppose you need to sort a relation of 40GB, with 4KB blocks, using a memory size of 40MB. Suppose the cost of a seek is 5ms, while the disk transfer rate is 40MB/s. In each case, how many merge passes are required? 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 10! 12.10.c Suppose you need to sort a relation of 40GB, with 4KB blocks, using a memory size of 40MB. Suppose the cost of a seek is 5ms, while the disk transfer rate is 40MB/s. Suppose a flash storage device is used instead of a disk, and it has a seek time of 1mcs, and a transfer rate of 40MB/s. Recompute the cost of sorting the relation in seconds, with bb=1 and bb=100 in this setting. 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 11! 12.3.a Let r1(A, B, C) and r2(C, D, E) have the following properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 fit on one block, and 30 tuples of r2 fit on one block. Estimate the number of block transfers and seeks required, using nested-loop join. To compute the theta join r θ s for each tuple tr in r do begin for each tuple ts in s do begin test pair (tr,ts) to see if they satisfy the join condition θ if they do, add tr • ts to the result. end end 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 12! 12.3.b Let r1(A, B, C) and r2(C, D, E) have the following properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 fit on one block, and 30 tuples of r2 fit on one block. Estimate the number of block transfers and seeks required, using block nested-loop join. 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 13! 12.3.c Let r1(A, B, C) and r2(C, D, E) have the following properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 fit on one block, and 30 tuples of r2 fit on one block. Estimate the number of block transfers and seeks required, using merge join. 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 14! 12.3.d Let r1(A, B, C) and r2(C, D, E) have the following properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 fit on one block, and 30 tuples of r2 fit on one block. Estimate the number of block transfers and seeks required, using hash join. 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 15! 12.11. Describe how to implement the following operation using sorting semijoin (⋉θ): r ⋉θ s is defined as ΠR (r ⋈θ s)where R is the set of attributes in the schema of r; that it selects those tuples in ri in r for which there is a tuple sj in s such that ri and sj satisfy predicate θ. 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 16! 12.13 Design a variant of hybrid merge-join algorithm for the case where both relations are not physically sorted, but both have a sorted secondary index on the join attributes. 01/04/15! Department of Computer Science and Engineering ! Indian Institute of Technology Ropar! 17!