Document 6518698

Transcription

Document 6518698
Overview
Data Structures for Databases II
Advanced SQL
Real
world
Model
Queries Answers
Lena Strömbäck
Databases
DBMS
Processing of queries
and
updates
Access to stored data
Physical
database
What is this about?
„
„
How to make more efficient kinds of indexes
What do you need to learn?
„
Multilevel indexing
Index on mutiple keys and in MySQL
„
PLSQL and triggers
„
Part II
‰
Advanced SQL for the project
Example
„
Assume an ordered data file with 1,000,000 records of
size 1000 byte and block size of 4,096 bytes.
Assuming an index record size of 32 bytes.
„
On average, how many block accesses need to be
performed to find a single record when searching for
the key field
a)
Using no index?
The number of blocks for the data file is 250,000
b)
Using a primary index?
Multilevel Indexes
Multilevel indexes
„
”Index on the index”
‰
„
„
Multilevel Index Example
„
Assume an ordered datafile with 1,000,000 records of
size 1000 byte and block size of 4,096 bytes.
Assuming an index record size of 32 bytes.
„
On average, how many block accesses need to be
performed to find a single record when searching for
the key field
Reduce the search space of the index by fitting indexes of
the index in fewer blocks until the top level index fits in one
block.
The reduction is determined by the blocking factor.
The value blocking factor is called as fan out (fo).
a)
Problems with Multilevel Indexes
Using a multilevel index
Search Tree
„
A search tree is a tree that is used to guide the search for a
record.
„
An ordinary search tree of order p consist of nodes that
have at most p-1 values and p pointers.
<P1, K1, P2, K2, …, Pq-1, Kq-1, Pq>
Problems when inserting and deleting data
‰
„
„
All levels are based on physically ordered files.
Use an overflow file and re-create the index during file reorganisation.
Use a dynamic multilevel index structure
Search Trees
where q≤p and Pi is a pointer to a child node (or a null pointer)
Search Tree: Example, order p=3
Pi
.
.
1. Within each node, K1 < K2 < … < Kq-i
2. For all values X in the subtree pointed by Pi:
If 1< i < q, Ki-1 < X < Ki
If i = 1,
X < K1
If i = q,
Kq-1 < X
Pq
B-Tree
B-Tree: Example, order p=3
B-tree = Balanced tree.
‰
all leaves are on the same level
‰
all nodes except the root and leaves have at most p pointers
and at least p / 2 pointers.
B-tree: Order
B-tree: Number of entries
Given: B = 4096 bytes, Precord = 16 bytes,
Pblock = 8 bytes, K = 64 bytes, fill percentage = 69%
„
One node must fit in one block:
Æ p <= 47
B+Precord+K
p ⋅ Pblock + ( p − 1) ⋅ ( Precord + K ) ≤ B ⇒ p ≤
Pblock+Precord+K
Nodes
Pointers
1
0.69*47≈33
33-1=32
Level1
33
33*33=1089
33*32=1056
Level2
1089
Level3
35,937
Root
p
Pblock
Precord
K
order, number of block pointer entries in a node
size of a block pointer
size of a record pointer
size of a search key field
„
„
„
=35,937
332 *32=34,848
334 =1,185,921
333 *32=1,149,984
333
The number of entries hold in the 3 level B-tree: 1,149,984
B+-Tree: Example, order p=3, pleaf=2
B+-tree
„
Entries
Order of insertion:
8, 5, 1, 7, 3, 12, 9, 6
A variation of the B-tree
Data pointers only stored in leaf nodes.
The leaf nodes are usually linked to provide ordered
access.
Most common dynamic multilevel index implementation
5
3
1
3
5
6
7
8
7
8
8
5
1
7
3
12
9
6
Andersson
Hagberg
French
Silver
Daniels
Young
Zhing
Baker
9
12
B+-trees: Internal nodes
B+-trees: Internal nodes
K1
1.
2.
3.
K1
4.
Each internal node is of the form
<P1, K1, P2, K2, …, Pq-1, Kq-1, Pq>
Within each internal node K1 < K2 < … < Kq-i
For all search field values X in the subtree pointed at by
Pi, we have
Ki-1< X ≤ Ki for 1 < i < q
for i = 1
X ≤ Ki
for i=q
Ki-1 < X
P1
K1
...
Ki −1 Pi
Ki
...
6.
P1
K q −1 < X
3.
4.
5.
Ki
...
Kq−1 Pq
K q −1 < X
Search: 8
5
Pr1
...
Ki
Pri
... K
q−1
7
3
1
3
5
6
8
7
8
9
12
Pq Pnext
B+-tree Order
B+-trees
„
One internal node must fit in one block:
⇒ p ≤
p ⋅ Pblock + ( p − 1) ⋅ K ≤ B
B+K
Pblock + K
Given: B=4096 bytes,
Precord=16 bytes, Pblock=8 bytes, K=64bytes,
fill percentage=70%
Æ p <= 57; pleaf<=51
Nodes
One leaf node must fit in one block:
p leaf ⋅ ( Precord + K ) + Pblock ≤ B ⇒ p leaf ≤
B
p
pleaf
Pblock
K
Precord
Ki −1 Pi
K i −1 < X ≤ K i
3
Each leaf node is of the form
<<K1, P1>, <K2, P2>, …, <Kq-1, Pq-1>, Pnext>
Within each leaf node K1 < K2 < … < Kq-i
Each entry contains a pointer to the record whose search
field value corresponds to the entry.
Each leaf node has at least p / 2 values.
All leaf nodes are at the same level.
K1
...
B+-Tree Search
1
2.
K1
X ≤ K1
B+-trees: Leaf nodes
1.
Each internal node has at most p tree pointers.
Each internal node, except the root, has at least
p / 2 tree pointers. The root node has at least two tree
pointers if it is an internal nodes.
An internal node with q pointers (q≤ p),
has q-1 search field values.
Kq−1 Pq
K i −1 < X ≤ K i
X ≤ K1
5.
B − Pblock
Precord + K
block size
order, number of pointer entries in an internal node
number of record pointer entries in a leaf node
size of a block pointer
size of a search key field
size of a record pointer
Pointers
Entries
≈ 40
40-1=39
Level1
40
40*40=1600
40*39=1560
Level2
1600
403 =64,000
402 *39=62,400
Root
Leaf level
1
0.7*57
Record pointers
64,000
64,000*0.7*51=2,284,800
the number of entries hold in the 3-level B-tree: 1,185,920
B+-trees Search
B+-trees Insertion and Deletion
Very fast searching in the index structure:
„



„
Insertion and deletion can be expensive.
log  p⋅ f  N 
N
p
f
number of search values
order, number of block pointers per node
fill factor, 0≤f≤1
B+-tree: Insertion
B+-Tree
When a leaf node is full it causes an overflow
‰
‰
‰
The first  p 2 entries in the node are kept there, the remaining
are moved to a new leaf.
The search value of new node move up to the parent. If the
parent is full, it will overflow.
The resulting split can propagate all the way up to the root.
Insert: 8
B+-Tree
B+-Tree
8
Insert: 5
5
Insert: 1
8
Overflow – create a new level
B+-Tree
B+-Tree
5
1
5
5
8
1
Overflow - Split
Insert: 7
5
7
8
Insert: 3
B+-Tree
B+-Tree
3
1
3
5
5
5
7
8
3
8
Overflow - Split
Propagates to a new level
1
Insert: 12
3
5
7
B+-Tree
5
3
3
12
Insert: 9
B+-Tree
1
8
5
8
5
7
8
3
9
12
1
3
7
5
Overflow – Split
Insert: 6
Resulting B+-tree
6
7
8
8
9
12
B+-tree: Deletion
B+-Tree
7
When a leaf node is less than haf full it causes an underflow
‰
‰
Redistribute, merge with sibling,
The resulting combining can also propagate to internal nodes.
1
6
1
5
9
6
7
8
9
12
Delete: 5
B+-Tree
B+-Tree
7
1
1
6
6
7
9
7
1
8
9
12
6
1
8
6
7
8
9
Underflow - redistribute
Delete: 12
Delete: 9
B+-Tree
B+-Tree
7
1
1
6
6
1
8
7
1
8
Underflow
merge with the left
propagate
reduce the tree levels
6
6
7
8
B+-trees
„
Many variations
‰
‰
‰
„
Indexes on Multiple Keys
B-trees
B+-trees
B*-trees (B+-tree with a fill factor of at least 2/3)
Common modifications
‰
‰
Change the fillfactor from 0.5 to 1.0
Allow a node to become empty before merging
e.g. select * from employee where dept = ‘CS’ and age = ’40’
Possible strategies for processing this query using indices on single
attributes:
„
„
„
Indexes on Multiple Keys
„
If the set of records that matches each condition is
large, but the combination is not, an index on the
composite may be useful.
‰
use index on dept to find employee with dept = ‘CS’, then test
them individually to see if age = ’40’
use index on age to find employee with age = ’40’, then test
them individually to see if dept = ‘CS’
use dept index to find pointers to all records of the CS
department, and use age index similarly, then take
intersection of both sets of pointers
Indexes in reality – MySQL
‰
InnoDB storage engine
„
„
„
ordered index on multiple attributes, treat the composite
as a single value
Create a clustered index for each table
Rows are physically ordered by the primary key
B-trees
‰
Advanced SQL
„
Functions and procedures
•
•
•
„
Must contain
RETURN …;
Used for performing series of SQL statements
Stored on the server side
Reduce the traffic between server and the clients
Triggers
•
„
Stored procedures and functions
Used for triggering controls and other actions
Programming
•
•
Flow control
Cursors
SHOW CREATE { PROCEDURE | FUNCTION } name
Triggers
Triggers
OLD.colname refers to the value
before the triggering event
NEW.colname refers to the value
after the triggering event
create trigger newemp
before insert on emp
for each row begin
declare n int;
select count(*) into n from emp where id=new.id;
if n>0 then begin
select max(id) into n from emp;
set new.id=n+1;
end; end if; end; //
SHOW TRIGGERS;
Cursors
1st
2nd
3rd
Flow control