Document

Transcription

Document
CSL 451 Introduction to Database Systems
Quiz 8, 8-4-2015
Name:
Roll Number:
1. Define/Explain the following
a. Hybrid-merge join (2 points)
- Algorithm to perform join between two relations, which merges the sorted relation with
the leaf entries of the secondary B+-tree index of the unsorted relation.
b. Hash join (2 points)
- Algorithm to perform join between two relations, which first partitions the relations
using a hash function defined on the join attributes. The corresponding pairs of
partitions of the two relations are then joined using a index nested-loop join using a
separate hash index.
c. Equivalence of two relational algebra expressions (1 point)
Two relational algebra expressions are equivalent if, on every legal database instance,
the two expressions generate the same set of tuples.
2. Give an instance of a relation that shows the expressions
not equivalent. (2 points)
R(A,B)
(a1, b1)
!(𝑅
− 𝑆) and
!(𝑅)
−
!(𝑆)
are
S(A,B)
(a1, b2)
!(𝑅
− 𝑆) = !(𝑎1, 𝑏1) = 𝑎1
(𝑅)
− ! 𝑆 = 𝑎1 − 𝑎1 = 𝜙
!
3. Let relations r and s have the following properties: r has 15,000 tuples and s has 1000 tuples.
10 tuples of r fit in one memory block, while 20 tuples of s fit in one block. The main memory
has a total of 100 blocks. Ir, a primary B+-tree index on relation r, occupies 25 blocks and Is, a
primary B+-tree index on s, occupies 15 blocks. Let the index attribute of both the relations be
the only attributes that participate in a theta join. An index nested-loop join is used to perform
the join. Let us also make the assumption that the index of the inner relation is completely
loaded into the memory, one block in the memory is used to store a block of the inner relation
and the remaining blocks of the memory are equally divided to store blocks of the outer
relation and join output. Estimate the cost of the join operation (in terms of number of disk
access) between r and s using index nested-loop join. Based on the cost estimate, which of
the relations should be used as the outer relation while performing the join? (8 points)
case 1: Let r and s be the outer and inner relations respectively for the index nested-loop join
15 blocks of the primary B+-tree index of relation s have to be copied to the main memory.
1 block in the main memory is used to store one block of relation s
42 blocks are available for storing blocks of relation r
42 blocks are available for storing the output of the join
There are 15,000/10 = 1,500 blocks of relation r. The blocks are continuous and therefore
require 1500/42 number of disk access to copy them into the main memory.
For each tuple of relation r loaded in the main memory, perform index lookup for relation s. In
the worst case we need 15 disk accesses to load the index of s into the main memory and one
additional access to the tuple of relation s that satisfies the join condition.
So total number of disk accesses – 1500/42 +15+15000*1 = 15051
case 2: Let s and r be the outer and inner relations respectively for the index nested-loop join.
25 blocks of primary B+-tree index of relation r have to be copied to the main memory
1 block in the main memory is used to store one block of relation r
37 blocks are available for storing blocks of relation s
37 blocks are available for storing the output of the join
There are 1000/20 = 50 blocks of relation s. The blocks are continuous and therefore require
50/37 number of disk access to copy them into the main memory.
For each tuple in relation s loaded in the main memory, perform index lookup for relation r. In
the worst case we need 25 disk accesses to load the index of r into the main memory and one
additional access to the tuple of relation r that satisfies the join condition.
So the total number of disk accesses – 50/37 +25+1000*1=1027
Relation s should be used as the outer relation in the index nested-loop join algorithm.