CS470: Introduction to Database Management Systems Functional Dependencies and Normal Forms

Transcription

CS470: Introduction to Database Management Systems
Functional Dependencies and Normal Forms
Relational Database Design
(Chapters 10 and 11)
V Kumar
School of Computing and Engineering
University of Missouri-Kansas City
Relational Database Design
Logical database schema design is concerned with organizing data into a logical form
acceptable to the underlying database system. One of the logical structures is Relational
structure which we use to develop logical schema (Conceptual schema). It is a complicated
process. Some of the important points which make it complicated are:
1. The designer is constrained by the limited data structure types supported by the data
base system.
2. The database designer may have to consider the access path of the records.
3. The database designer may have to consider how to make database access and
modification efficient.
4. The designer has to identify and select a set of most relevant attributes for an entity.
5. The designer has to identify the size of a relation and connect two or more relations for
navigation.
A good relational database is a set of good relational schema. A good relational schema
contains a set of relevant attributes of the entity it represents where every attribute is clearly
related (directly or indirectly) to other attributes of the relation. A good relation should require
minimum storage space and have minimum data redundancy.
I relational schema has a name and a set of related attributes. Let us consider the following
relational schema – Dept.- of an entity “Department”.
Example: A bad relational schema
Dept
Dept
Dnumber Dlocation
Dname
Interest rate
Dnumber
Dlocation Dname
Room size
The first schema is bad because the attribute “Interest Rate” is nothing to do with entity
Department as a result it is not related to other attributes (Dnmber, Dlocation, and Dname).
The second schema is a good schema.
Sport
Sname
Inst_id Inst_name Expertise Fee
Football
Tennis
Baseball
Golf
1
1
2
2
S1
Tom
Tom
Peter
Peter
Football
Tennis
Baseball
Golf
200
200
300
300
S2
Inst_name Inst_id Expertise 1 Expertise 2 Fee
Tom
Peter
1
2
Football
Baseball
Tennis
Golf
2/17
200
300
Sname
Football
Tennis
Baseball
Golf
Inst_id
1
1
2
2
When storage space requirement to store a schema is concerned, a good schema conserves
space. To achieve this information redundancy (repetition of same data) is minimized. This is
done by breaking (splitting) the relation into two or more smaller relations. This split helps to
reduce information redundancy, which improves database consistency (correctness).
Example: The schema Sport (given above) is a good schema but it can be further improved.
The storage requirement of Sport can be reduced by splitting it into two relations S1 and S2.
Suppose it takes one word to store the value of an attribute. Sport relation will require 20
words (this does not include the space the structure takes). When Sport is split into S1 and S2
relations, then S1 and S2 together need 18 words to store the same information. The
duplications (Tom Tom and Peter Peter) are removed.
Let us identify a few relational schema design problems. Consider the following Student
relation records Student identify, the activity students take and fee they pay for these activities.
Student
Stu_id
100
100
150
175
175
200
200
Activity
Skiing
Golf
Swimming
Squash
Swimming
Swimming
Golf
Fee
200
65
50
50
50
50
65
Suppose Student 100 gave up skiing. The first record must be deleted from the database. If it
is not then the database will have incorrect information (not consistent with the real world).
This deletion has bad side effect which makes the database incomplete.
Effect: Deleting this record also removes information about Skiing fee. Thus, more
information is deleted than intended. This is not a good relation because a query “What is the
fee for Skiing?” cannot be answered.
Now suppose you want to add activity Baseball in the database and you are going to charge
200.
Effect: You cannot add this information since there is no student enrolled in Baseball. You
cannot leave Stu-id field for Baseball empty. We cannot add a fact about one attribute until we
have an additional fact about another related attribute. To add this record you must have a
student willing to learn baseball. You are not allowed to insert null value in Stu_id attribute.
This is not a good relation.
These problems are referred to as Modification Anomalies. There are many and we will
discuss them in length. What we need is a relation where a delete operation deletes only the
relevant tuples and an insert can insert any tuple at any time in other words we wants relations
which have minimum or no modification anomalies. How to minimize or eliminate
modification anomalies? To get an answer of this question we first need to understand the
dependency theory to be able to design good relational schema. Dependency theory illustrates
how one or more attributes of R depends on one or more attributes of R. Thus, how Ai depends
on Aj (i = j, i ≠ j).
3/17
Dependency Theory
To maintain consistency a relation must satisfy a set of integrity constraints. A consistent
relation reflects the facts. For example, if an instructor teaches a database course then the
database must reflect this information, i.e., the relation which stores this information must
satisfy constraints related to instructor and course attributes. With respect to database
processing it means that assigning new values to a set of attributes must satisfy this set of
constraints to maintain the correctness (fact) of the database. We need to formalize these
concepts. We need the following notations:
R = indicates a relational schema name.
A1, A2, …, An = indicate attributes included in the R.
R(A1, A2, …, An) = indicate the structure of the schema with name R.
r = indicates a relation which is an instance of a relational schema.
r(R) = indicates an instance of R(A1, A2, …, An).
Let γ be an integrity constraint on a schema R. γ is a function that associates each relation
r(R) a Boolean value γ(r). A relation r satisfies γ if γ(r) = true (i.e., γ(r) holds in r) and violates
if γ(r) = false.
Every database must satisfy a set of defined constraints for preserving correctness. For this
reason every database schema is associated with a set of constraints which are usually
expressed by means of closed sentences of some first-order predicate calculus.
Integrity constraints can be classified into two main categories:
1. Intra-relational constraints. Each such constraint involve only one relation scheme.
2. Inter-relational constraints. Each such constraint spans over more than one relational
scheme.
We will deal mainly with intra-relational constraints. One of the most common intrarelational constraints is called Key dependency. Given a relation scheme R(U) where (U = A1,
A2, …, An) a key dependency is expressed as key(K), where K ⊆ U, and is satisfied by a
relation r, if and only if t1(K) ≠ t2(K). When intra-relational constraints encompass non-key
attributes, then they are called as Functional Dependencies (FD) and key dependency becomes
a subset of functional dependencies. These functional dependencies are the basis of relational
database design. Inter-relational involves more than one relation, where attribute of one
relation establishes relationship with the attribute of another relation in forming functional
dependencies.
Functional Dependencies (FD)
We will study the reasons for modification anomalies discussed earlier and study possible
solutions. These problems arise since attributes of a relation are logically and semantically
related. A good relation require that the attributes of a relation must be related but on the other
hand such relationships cause modification anomalies. Our aim is to minimize these anomalies
because they cannot be eliminated completely. To do so we begin our discussion with
Functional Dependency (FD).
4/17
Schedule
Pilot
Cushing
Cushing
Clark
Clark
Clark
Chin
Chin
Copley
Copley
Copley
Flight
83
116
281
301
83
83
116
281
281
412
Date
9 Aug
10 Aug
8 Aug
12 Aug
11 Aug
13 Aug
12 Aug
9 Aug
13 Aug
15 Aug
Departs
10:15a
1:25p
5:50a
6:35p
10:15a
10:15a
1:25p
5:50a
5:50a
1:25p
Informally functional dependency defines the relationship among attributes of a relation,
i.e., it defines the effect of a change of the value of a set of attributes on another set of
attributes. For example, if we change the value of a SSN then the value of the corresponding
Name attribute must also change to preserve consistency. We can represent this functional
relationship (functional since an operation on a set of attribute initiates corresponding changes
on other related attributes of a relation) in terms of a set of constraints. Consider the following
relation SCHEDULE.
We can identify the following restrictions:
1. Exactly one time for one flight.
2. For a {pilot, date, time} there is one flight.
3. For a {flight, date} there is one pilot.
These restrictions indicate how this relation can be processed (modified, expand, contract
etc.). These are examples of a relationship called Functional Dependencies (FD). We can say
that an FD occurs when in a tuple the value of a set of attributes uniquely determines the values
of another set of attributes.
Notations: we will use → to indicate functional dependency between two set of attributes. So
X → Y will mean X functionally determines Y. X is called the Left side of FD and Y the Right
side of FD. X is also called the determinant of the FD X → Y
Example: If we have Phone number → City name, then the value of phone number
determines the city name and if the value of the phone number changes then the value of
(name) city name will also change.
Formally: Let r be a relation on R(X, Y). if r satisfies the FD X → Y then if t1(X) = t2(X), we
must have t1(Y) = t2(Y). This means that the Y value of a tuple in r(R) is determined by the X
value of that tuple in r(R), i.e.,
Y is functionally dependent on X or X functionally determines Y.
Full functional dependency: X → Y is a full functional dependency if all members of
attribute set X must be present to hold the dependency. For example, let X = {A, B, C} and Y =
{D}. If {A, B, C} → {D} then it is full functional dependency, i.e., Y is fully functionally
5/17
dependent on X. This means that the value of D can only be determined by the values of A, B,
and C. On the other hand if we have {A, B} → {D} then Y is partially dependent on X (partial
functional dependency) because the value of C is not necessary to determine the value of D.
Prime attribute: Attribute ∈ primary key set. Non-prime attribute: Attribute ∉ primary
key set.
There is no formula to establish FD. The semantics of a relation should indicate how
attributes of a relation are related. Algorithm to test if a given FD holds on a relation.
SATISFY (<relational name>, FD). Output: true if relation satisfies FD X → Y, false
otherwise
1. Sort the relation r on its X attribute values to bring tuples with equal X-values together.
2. If for each ti(X) = tj(X) there exist ti(Y) = tj(Y) then return true otherwise return false.
Apply SATISFY (SCHEDULE, FD: Flight → Depart) to relation SCHEDULE.
Schdule
Pilot
Cushing
Clark
Chin
Cushing
Chin
Clark
Copley
Copley
Clark
Copley
Flight
83
83
83
116
116
281
281
281
301
412
Date
9 Aug
11 Aug
13 Aug
10 Aug
12 Aug
8 Aug
9 Aug
13 Aug
12 Aug
15 Aug
Departs
10:15a
10:15a
10:15a
1:25p
1:25p
5:50a
5:50a
5:50a
6:35p
1:25p
Result: This FD exists because we have left hand side value Flight = 281 we have Depart =
5:50a as the right hand value. Similarly whenever we have left hand side value Flight = 83, we
have Depart = 10:15a as the right hand value. Note that FD is the relationship among attributes
of a relation.
We can examine another FD: Departs → Flight. SATISFY (SCHEDULE, Departs → Flight).
The relation schedule is analyzed as before to find this FD. It appears that this FD is not
satisfied because for a left hand value Depart = 1:25p, there are two different values for Flight,
i.e., Flight = 116 and Flight = 412. The following relation illustrates the result (False).
Similarly FD: Date → Flight is not satisfied in this relation.
Schedule
Pilot
Clark
Copley
Copley
Cushing
Flight
281
281
281
83
6/17
Date
8 Aug
9 Aug
13 Aug
9 Aug
Departs
5:50a
5:50a
5:50a
10:15a
Clark
Chin
Cushing
Chin
Copley
Clark
83
83
116
116
412
301
11 Aug
13 Aug
10 Aug
12 Aug
15 Aug
12 Aug
10:15a
10:15a
1:25p
1:25p
1:25p
6:35p
Two extreme cases: X → ∅ trivially satisfied by any relation and ∅ → Y satisfied by those
relations where every tuple has the same Y-value.
Graphical representation of FD
The head of the arrows pointing to the right side of FDs and the tails are connected to the
left side of FDs.
Schema Emp_Dept: SSN → {Ename, Bdate, Address, Dnumber}.
Dnumber → {Dname, Dmgrssn}.
Emp_Dept
Ename SSN Bd ate Ad d ress Dnu mber Dname Dm grssn
Schema: Emp_Proj
FD (F): {SSN, Pnumber}→ Hours
SSN→ Ename
Pnumber → {Pname, Plocation}
Emp_Proj
SSN Pnu m ber H ou rs Enam e Pnam e Plocation
IMPORTANT
1. One must remember that FDs in a relation are defined by the database designer. In the
above examples these FDs may not be valid if they have not been defined, even though
the occurrence of these relation schemas exhibit FDs.
2. A relational schema R may have n instances. If an FD for R is identified then every
instance of R must satisfy the FD. A FD on R is false if one instance of a relation
satisfies it while another instance does not. To verify if a certain FD is true one has to
check all possible instances of R.
Closure of a FD: Defining FDs of a schema requires semantic knowledge. A database
designer defines a set of FDs for a schema. Let us call this set F. It is possible that some
7/17
additional FDs may be the side effect of F. We will call them as derived FDs. Formally, the
set of all such FDs derived by F is called the closure of F and represented by F+. For example
consider schema EMP-DEPT:
Emp_Dept
Defined F =
SSN → {Ename, Bdate, Address, Dnumber}
Dnumber → {Dname, Dmgrssn}
Derived F+ = SSN → {Dname, Dmgrssn}
SSN → SSN
Dnumber → Dname}
It is very time consuming to discover all possible F+ for F for a given schema using a
sequential scan of the relation. Even though the cardinality of a schema is finite, the degree of
an instance may be very large and a multiple complete scanning of all possible instances of this
schema would be prohibitively expensive. To discover F+ we define a set of rules called
inference rules. We use notation F X → Y to indicate that F infers X → Y or defined FD set
F derives X → Y. We define 6 inference rules. The first three are known as Armstron's
inference rules.
1.
Reflexsive rule: This rule states that a set of attributes determines themselves. For
example, {SSN, Account No.} → {SSN, Account No.} or {SSN, Account No.} →
{Account No.}
Formally, if Y ⊆ X ⊆ U, then X → Y. This rule gives the trivial dependency, those that
have a right side contained in the left side.
Proof 1: The reflexivity axiom is clearly sound, i.e., using this rule we cannot deduce from
F any dependency that is not in F+. We cannot have a relation r with two tuples that agree
on X yet disagree on some subset of X. Let R be a schema with attribute subset X and Y
where Y ⊆ X. Consider a pair of tuples t1 and t2 ∈ r(R). If t1 (X) = t2(X), then t1(A) =
t2(A) for every A ∈ X. Then, since Y ⊆ X, t1(A) = t2(A) for every A ∈ Y, which is
equivalent to t1(Y) = t2(Y). Therefore, X → Y must hold in r(R). If the attribute set has
only one element, e.g., X, then X → X. Thus ΠX(σX=x(r)) always has at most one tuple.
Consider a relation Emp:
Emp
SSN
123
111
100
110
Emp_Name
A
B
C
D
8/17
Salary
50K
60K
60K
65K
Age
65
45
42
42
Suppose X = {SSN, Emp_Name, Age} and Y = {Emp_Name, Age}. Thus, Y ⊆ X. We
define X → Y. That is {SSN, Emp_Name, Age}→ {Emp_Name, Age}. We take two
tuples t1 (X) = t1(123, A, 65) and t2(X) = t2(111, B, 60K), and t1(Y) = t1(A, 65) and t2(Y)
= t2(B, 45). We see that t1(X) ≠ t2(X) and, therefore, t1(Y) ≠ t2(Y). Now suppose we say
t1(X) = t2(X), which means t1 (X) = t1(123, A, 65) and t2(123, A, 65). This indicates that
t1(Y) = t2(Y), i.e., t1(A, 65) = t2(A, 65).
2. Augmentation: This rule states that a new valid FD is generated if the same set of
attributes is added to the left and right side of the existing FD.
Formally, if X → Y then XZ → YZ or XZ → Y where Z ⊆ R.
Proof: If r satisfies X → Y, then ΠY(σX = x (r)) has at most one tuple for any X-value x.
This means that when we apply a select on r with predicate X = x then there will be at most
one tuple and a projection on this will get the corresponding Y value. Similarly, if Z ⊆ R
then σXZ=xz(r) ⊆ σX=x(r) and hence ΠY(σXZ=xz(r)) ⊆ ΠY(σX=x(r)). Thus
ΠY(σXZ=xz(r)) has at most one tuple and r must satisfy XZ→Y.
Example: F = A → B
r(A
a1
a2
a1
a3
B
b1
b2
b1
b3
C
c1
c1
c1
c2
D)
d1
d1
d2
d3
We see that by axiom 2, F+ are
AB → B, AC → B, AD → B, ABC → B, ADB → B, ACD → B and ABCD → B.
It can also be seen that AC → BC, AD → BD, ACD → BCD and so on. AC → BC means
whenever t1(AC) = t2(AC), there will be tuples where t1(BC) = t2(BC). The text book
(Elmasri/Navathe) proves it by contradiction.
3. Transitive Rule: This establishes that if X → Y and Y → Z then A → Z.
Proof: Let r's F = {X→Y, Y→Z}. Let t1 ∈ r and t2 ∈ r. We know that if t1(X) = t2(X),
then t1(Y) = t2(Y) and also if t1(Y) = t2(Y), then t1(Z) = t2(Z). Therefore if t1(X) = t2(X),
then t1(Z) = t2(Z). This is one of the most important axioms.
Example: F = A → B, B → C, r satisfies A → C
r(A
a1
a2
a3
a1
B
b1
b2
b1
b1
C
c2
c1
c2
c2
D)
d1
d2
d1
d3
The rest of the three axioms can be proved by using the first three axioms.
9/17
4. Projectivity (Decomposition): This rule states that some attributes can be removed from
the right hand side without affecting the FD.
Formally, if X → YZ then X → Y.
Proof: If r satisfies X → YZ, then ΠYZ(σX=x (r)) has at most one tuple for any X-value x.
Since ΠY(ΠYZ(σX=x (r))) = ΠY(σX=x (r)), ΠY(σX=x (r)) can have at most one tuple.
Hence r satisfies X → Y.
5. Union or additive rule: This axiom allows us to combine two or more FDs with the same
left side.
Formally, if X → Y and X → Z, then X → YZ .
Proof: If r satisfies X → Y and X → Z then ΠY(σX=x (r)) and ΠZ(σX=x (r)) both have at
most one tuple for any X-value x. If ΠYZ(σX=x (r)) had more than one tuple, then at least
one of ΠY(σX=x (r)) and ΠZ(σX=x (r)) would have more than one tuple. Thus X → YZ.
6. Pseutransitivity rule: This rule allows us to extend the transitive rule further. Formally, if
{X → Y, WY → Z} then WX → Z.
Let r satisfy X → Y, WY → Z and let t1 and t2 be tuples in r. We know that if t1(X) =
t2(X), then t1(Y) = t2(Y) and also t1(WY) = t2(WY) then t1(Z) = t2(Z). From t1(WX) =
t2(WX) we can deduce that t1(X) = t2(X) and so t1(Y) = t2(Y) and further t1(WY) =
t2(WY), which implies t1(Z) = t2(Z). Thus WX → Z.
Reflexivity, Augmentation and Transitivity rules are called Armstrong’s inference rules. The
other three rules can be proved by the first three. Using these 6 rules, it is possible to derive
other inference rules for FDs.
Normal forms and Modification Anomalies
Normal forms (NF): A NF of a relation defines the type of modification anomalies it
eliminates. There are First normal form (1NF), Second normal form (2NF), Third normal form
(3NF), Boyce-Codd normal form (BCNF), Fourth normal form (4NF), Domain/Key normal
form (DK/NF) and Fifth normal form (5NF). We will study several of them.
Normalization: The process of transforming a relation from a lower normal, including nonnormal form to upper normal forms is called normalization.
Degree
Student Name
John
Kumar
Year
1990
2002
1967
1969
1983
Degree
MS
BS
BS
MS
Ph.D.
Repeating Groups: If one value of an attribute determines more than one value of another
attribute, then these multiple values are called repeating groups. For example, in relation
Degree Student Name and Degrees are two attributes. For one value of Student Name = John,
there are three value of Degree (BS, MS, Ph.D.). Relation Degree is in a non-normalized
10/17
form. Year vales are repeated for one value of Student Name, thus, Year has repeating groups.
Degree is a non-normalized relation and not allowed in relational model.
Relation Degree must be normalized before it can be processed by relational database
systems. The normalized relation Degree is given below.
Degree
Student Name
John
John
Kumar
Kumar
Kumar
Year
1990
2002
1967
1969
1983
Degree
MS
BS
BS
MS
Ph.D.
We now define these in terms of Normal Forms (NF).
1NF: A relation is in 1NF if it does not contain repeating groups, i.e., all its attributes are
atomic.
Normalized:
Order
Ono
12489
12491
12491
12494
12495
12498
12498
12500
Date
90287
90287
90287
90487
90487
90587
90587
90587
Pno
AX12
BT04
BZ66
CB03
CX11
AZ52
BA74
BT04
No_Ordered
11
1
1
4
2
2
4
1
Order relation is in 1NF, since it does not have repeating groups.
Degree: 4.
Cardinality: 8
Primary Key (PK): {Order_number, Part_number}.
Only one attribute cannot be a candidate key.
There are many superkeys.
We want to identify if this relation has any modification anomalies. If it has then we try to
minimize or eliminate them. Consider the following relation. It is in 1NF.
Order
Ono
12489
12491
12491
12494
12495
12498
12498
12500
Date Part_descrip
90287
Iron
90287
Stove
90287
Washer
90487
Bike
90487
Mixer
90587
Skates
90587
Baseball
90587
Stove
11/17
Pno
AX12
BT04
BZ66
CB03
CX11
AZ52
BA74
BT04
No_Ordered
11
1
1
4
2
2
4
1
Modification anomalies
Update:
A change to the description of BT04 requires several changes since BT04
has been duplicated as a result of normalization.
Inconsistent data: In the absence of incomplete update, BT04 may have different values in
other attributes.
Additions:
We cannot add ZZ14 until we have an order for ZZ14.
Deletion:
By deleting BT04 we lose that BT04 represents Stove.
Conclusion:
We conclude that Order has modification anomalies, which should be
minimized or eliminated.
Minimization or elimination process: A relation with modification anomalies is further
normalized to higher normal form. Usually the normalization to one
higher normal form resolves the issue. If not then it is normalized to next
higher normal form. Further normalization is usually done by splitting
(vertically) the relation into two or more relations. So the solution for
Order is to normalize it to 2NF. We first define 2NF.
2NF: A relation schema is in 2NF if it is in 1NF and non-prime attribute is fully functionally
dependent on PK. Dependency diagram of Order
Ord er_N o Date Part_N o
PK: {Order_no, Part_no}
Part_Desc N o_Ord ered Price
Order is 1NF: All attributes are atomic.
Order is not in 2NF, because non-key attribute Part_desc is dependent upon Part_no, which is
a portion of the PK. Also Date is dependent upon Order_no (part of the PK). There is partial
dependency among attributes of Order.
Order
Order N o.
12489
12491
12494
12495
12498
12500
Part
D ate
90287
90287
90487
90487
90587
90587
Part N o.
AX12
AZ52
BA74
BH 22
BT04
BZ66
CA14
CB03
CX11
Order_line
Part_D esc.
Iron
Skates
Baseball
Toaster
Stove
Washer
Skillet
Bike
Mixer
Part N o.
AX12
BT04 BZ66
CS03
CX11
AZ52
BA74
BT04
N o_Ordered Order_N o
11
1
1
4
2
2
4
1
12489
12491
12491
12494
12495
12498
12498
12500
Price
14.95
402.99
311.95
175.00
57.95
22.95
4.95
402.99
1NF → 2NF: Split the relation into two or more via projection as follows
1. Identify the set of attributes that makes up the PK: {Order_no, Part_no}.
2. Create all subsets of the above set: {Order_no}, {Part_no} and {Order_no and
Part_no}.
12/17
3. Designate each of these subsets as the PK of a relation that contains those attributes,
which are dependent on these PKs:
Primary Keys: {Order_no}, {Part_no} and {Order_no, Part_no}
Relational schemas
Order (Order_no, Date). Date is functionally dependent on Order_no (see diagram
above).
Part (Part_no, Part_desc). Part_desc is functionally dependent on Part_no.
Order_line (Order_no, Part_no, No_ordered, Price)
All these relations are in 2NF. Anomalies have been eliminated and can be verified as follows:
Change: If BT04 is changed to something else then it requires only one change in Part
relation.
Add a new part and its description: If a new tuple is added in Part then there is no need to
have an order exist for that part.
Delete order 12489: This delete does not cause AX12 to be deleted from Part, thus we do not
loose the description of AX12.
Information loss: none.
Q. Does this imply that relations in 2NF do not have modification anomalies?
A. No. Relations in 2NF may suffer with all modification anomalies.
Example
Customer
Cust_no
124
256
311
315
405
412
522
567
587
622
Name
Sally A
Ann S
Don C
Tom D
Al W
Sally A
Mary N
Joe B
Judy R
Dan M
Address
4747 Troost
215 Oak
48 College
914 Cherry
519 Watson
16 Elm
108 Pine
808 Ridge
512 Pine
419 Chip
Slsrep_no
2
6
12
6
12
3
12
6
6
3
Slsrep_name
Tom J
Bill S
Sam B
Bill S
Sam B
Mary J
Sam B
Bill S
Bill S
Mary J
The dependency diagram of Customer
Cu st_N o
N am e
Ad d ress
Slsrep _N o
Slsrep _N am e
Customer is in 2NF. It suffers with all the anomalies
13/17
Update: A change to Slsrep_name requires multiple changes.
Inconsistent data: There is nothing in the design that would prohibit a Slsrep_name from
having two different names.
Additions: To add Slsrep_no 47, there must be a customer for 47 first.
Deletions: If we delete all the customers of a sales rep then we lose the name of the Sales rep
also.
Reason for these anomalies: Slsrep_no, which is not a PK, determines Slsrep_name. As a
result Slsrep_no can appear many times in the relation
Remedy: Normalize Customer relation by transforming it into 3NF relations.
3NF: A relation scheme R is in 3NF if it is in 2NF and no non-prime attribute of R is
transitively dependent on the primary key.
Transitive dependency: A transitive dependency exists among 3 or more attributes. Example
SSN → Dnumber
Dnumber → Dname and Dnumber → Dmgrssn
Therefore SSN → Dname and SSN → Dmgrssn transitively.
Example
Housing relation is in 2NF. Fee is transitively dependent on SID so it is not in 3NF. This
relation has all modification anomalies. Housing can be converted to 3NF by normalization
process.
Housing
SID
Building
Fee
100
150
200
250
300
Randolph
Ingersol
Randolph
Pitkin
Randolph
1200
1100
1200
1100
1200
SID
Building
Fee
2NF → 3NF
Housing
Fee
SID
Building
Building
Fee
100
150
200
250
300
Randolph
Ingersol
Randolph
Pitkin
Randolph
Randolph
Ingersol
Pitkin
1200
1100
1100
3NF also have modification anomalies. Consider the Advisor relational.
14/17
Relationships
1.
2.
3.
4.
A student can have one or more majors.
A major can have several faculty as advisors.
A faculty member advises in only one major area.
SID cannot be a key since a student can have many majors and therefore many
advisors.
5. A a student cannot have many advisors in the same area.
Keys are: (SID, Major) → Fname. (SID, Fname)→Major. One of these sets can be selected
as a primary key. Determinant: Fname→ Major
Advisor
SID
100
150
200
250
300
300
Major
Fnam e
Math
Psychology
Math
Math
Psychology
Math
Cau chy
Ju ng
Riem ann
Cau chy
Perls
Riem ann
SID
Major
Fnam e
This relation does not have transitive dependency. Advisor is in 3NF since there is no
transitive dependency but it has modification anomalies.
Deletion: Delete SID 300, we lose the fact that Perls advises in Psychology.
Addition: Cannot add the fact that Keynes advises in Economics if there is no student enrolled.
Update: If Cauchy advises in Physics then multiple changes are required.
Inconsistency: Any change in Cauchy-Math will make Advisor inconsistent.
Solution: Further normalization to Boyce/Codd Normal form.
Boyce/Codd Normal Form (BCNF): A relation is in BCNF if every determinant is a candidate
key.
Advisor is not in BCNF because Fname → Major, and Fname is not a candidate key.
3NF to BCNF
Advisor
SID
100
150
200
250
300
300
Major
Fnam e
Major
Fnam e
Cauchy
Jung
Riemann
Cauchy
Perls
Riemann
Math
Psychology
Math
Math
Psychology
Math
Cau chy
Ju ng
Riem ann
Cau chy
Perls
Riem ann
Relations in BCNF are not entirely free from anomalies. Consider Student relation:
Student
SID
100
100
100
100
150
Major
Music
Accounting
Music
Accounting
Math
15/17
Activity
Swimming
Swimming
Tennis
Tennis
Jogging
Semantics: A student can enroll in more than one major and in more than one activity. PK:
All three attributes.
What is the relationship between SID and Major? It is not functional dependency, because
students have several majors. There is some sort of relationship. This relationship can be
illustrated by an example.
Suppose: Student 100 wants to enroll in Skiing. Add: tuple 100 Music Skiing. Resulting
relation
Student
SID
Major
Activity
100
Music
Skiing
100
Music
Swimming
100 Accounting
Swimming
100
Music
Tennis
100 Accounting
Tennis
150
Math
Jogging
Semantics: It implies that Student 100 Skis as a Music major but he/she does not know to ski
as an Accounting major. Illogical.
Solution: Add tuple: 100 Accounting Skiing. The resulting relation is consistent. The
relationship between SID and Major is a Multivalued dependency. SID determines not a single
value but several values. Thus (SID 100) determines majors (Music, Accounting) and
activities (Skiing, Swimming, Tennis). The relation is in BCNF since all attributes make the
primary key. It has anomalies.
Addition: If one tuple (100 Accounting Skiing) is added then several other tuples must be
added to preserve consistency.
Student
SID
Major
100
Music
100
Accounting
100
Music
100
Accounting
100
Music
100
Accounting
150
Math
Activity
Skiing
Skiing
Swimming
Swimming
Tennis
Tennis
Jogging
Delete: If (100 Music Skiing) is deleted then (100 Accounting Skiing) also has to be deleted
even though Major and Activity are not related.
Solution: Break this relation into two by projection
S_major
S_Activity
SID
Major
SID
Activity
100
100
150
Mu sic
Accou nting
Math
100
100
100
150
Sking
Sw imming
Tennis
Jogging
16/17
Multivalued dependency always occur in pairs. For the Student relation SID →→ Major
because Major depends only on the value of SID and not on the value of Activity. Similarly,
SID →→ Activity. Activity does not dependent on Major since having a particular major
implies nothing about Activity. The no relationship between Major and Activity creates
problem in the sense that whenever we add a new Major, we must add a tuple for every value
of Activity.
Multivalued dependencies always occur in pairs. In a relational scheme R, if A→→B, then
A→→C. This must be so because if B is unrelated to C then C must also be unrelated to B.
The independence among attributes is not a problem if the attributes have a single value.
Example
Student_shoe
SID
100
150
200
250
Shoe-size
8
10
5
12
Marital_status
M
S
S
S
Primary Key: SID.
In Student_shoe, Shoe_size and Marital_status are independent and have single value,
anomalies cannot occur. We can delete or add any tuple with no problem. This observation
leads to the definition of 4NF
4NF: A relation is in 4NF if it is in BCNF and it has no multivalued dependencies.
17/17

CS470: Introduction to Database Management Systems Functional Dependencies and Normal Forms

Transcription

Similar documents

4. Select the “My Lexile Range” drop down. Now, click search. 2

Day of Action Volunteer Flyer

Form - Amhpac

CSCI 585 – Database Systems Sample Midterm Exam Prof. Dennis McLeod

The PassTime Paper - Rights Reserved © 2016

WIFE`S DIARY - Rights Reserved © 2016

Who`s the Best Hunter ?

Volume 12 Key Amendments Since the original manual posting February 2012

Jaylene Tours West Itinerary

boys 2MEN - River of Life Church