CMSC 424 – Database Design Lecture 11 Normalization Mihai

2y ago
15 Views
2 Downloads
210.59 KB
23 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Elisha Lemon
Transcription

CMSC 424 – Database designLecture 11NormalizationMihai Pop

The Normal Forms 1NF: every attribute has an atomic value (not a set value) 2NF: we will not be concerned in this course 3NF: if for each FD X Y either– it is trivial or– X is a superkey– Y-X is a proper subset of a candidate key BCNF: if for each FD X Y either– it is trivial or– X is a superkey1NF 2NF 3NF BCNF 4NF, : we are not concerned in this course.4NF,.

Goals Lossless decomposition Dependency preservation Recap: FD closure, attribute closure

FDs, Normal forms, etc., why? Start with a schema Decompose relations until in a normal form Functional dependencies (constraints we'd like preserved)drive the decomposition The resulting schema is “better” Note that functional dependencies can either be:– explicit: we want to enforce these constraints irrespectiveof data in the relations – can be encoded in SQL– implicit: the data happen to satisfy them (see netflixexample)Normalization only concerned with explicit FDsPrivacy/anonymization – need to worry about implicitFDs

Boyce-Codd Normal FormA relation schema R is in BCNF with respect to a set F of functionaldependencies if for all functional dependencies in F of the form where α R and β R, at least one of the following holds: is trivial (i.e. ) is a superkey for R, i.e. RExample schema not in BCNF:bor loan ( customer id, loan number, amount )because loan number amount holds on bor loan but loan number isnot a superkey

Decomposing a Schema into BCNF Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.We decompose R into: R In our example,– α loan number– β amountand bor loan is replaced by ( loan number, amount ) R ( customer id, loan number )

Testing for BCNF To check if a non-trivial dependency α β causes a violation ofBCNF compute α (the attribute closure of α), and verify that it includes all attributes of R, that is, it is a superkey of R.Simplified test: To check if a relation schema R with a given set offunctional dependencies F is in BCNF, it suffices to check only thedependencies in the given set F for violation of BCNF, rather thanchecking all dependencies in F . We can show that if none of the dependencies in F causes a violationof BCNF, then none of the dependencies in F will cause a violationof BCNF either.

Testing for BCNF.cont However, using only F is incorrect when testing a relation in adecomposition of R E.g. Consider R (A, B, C, D), with F { A B, B C} Decompose R into R (A,B) and R (A,C,D)12 Neither of the dependencies in F contain only attributes from(A,C,D) so we might be mislead into thinking R2 satisfies BCNF. In fact, dependency A C in F shows R is not in BCNF.2Simplified test: Avoids computing F For every subset α of R compute α under Fi Then either α includes no attributes of R -α or includes all attributesiof Ri In R (A,C,D) aboveA ABC, A -(A) (BC) includes an attribute2of Ri but not all (violation) Then α (α - α) R is the violator A BC (ACD) C is an FDi(actually in F ) which violates BCNF

BCNF Decomposition Algorithmresult : {R};done : false;compute F ;while (not done) doif (there is a schema R in result that is not in BCNF)then beginlet α β be a nontrivial functionaldependency that holds on Risuch that α R is not in F ,and α β ;result : (result – Ri) (Ri – β) (α, β );endelse done : true;iiNote: each Ri is in BCNF, and decomposition is lossless-join.

Example of BCNF Decomposition R (branch-name, branch-city, assets,customer-name, loan-number, amount)R (Bn,Bc,As,Cn,Ln,Am)F {Bn As Bc,Ln Am Bn, F (branch-name assets branch-cityloan-number amount branch-name)Key {loan-number, customer-name} Decomposition– R1 (branch-name, branch-city, assets)– R2 (branch-name, customer-name,loan number, amount)– R3 (branch-name, loan-number, amount)– R4 (customer-name, loan-number) Final decompositionR1, R3, R4Ln Cn Bn Bc As Am}1) Bn As Bc in RBn {Bn As Bc} key not SKDecomposeR1 (Bn,Bc,As)R2 (Bn,Cn,Ln,Am)2) Ln Am Bn in R2Ln {Ln Am Bn As Bc}decomposeR3 (Ln Am Bn)R4 (Ln Cn) not SK

BCNF and Dependency Preservation Constraints, including functional dependencies, are costly tocheck in practice unless they pertain to only one relationIf it is sufficient to test only those dependencies on eachindividual relation of a decomposition in order to ensurethat all functional dependencies hold, then thatdecomposition is dependency preserving.Because it is not always possible to achieve both BCNF anddependency preservation, we consider a weaker normalform, known as third normal form.

Third Normal Form A relation schema R is in third normal form (3NF) if for all:α β in F at least one of the following holds:– α β is trivial (i.e., β α)– α is a superkey for R– Each attribute A in β – α is contained in a candidate keyfor R.(NOTE: each attribute may be in a different candidatekey) If a relation is in BCNF it is in 3NF (since in BCNF one of thefirst two conditions above must hold). Third condition is a minimal relaxation of BCNF to ensuredependency preservation (will see why later).

3NF (Cont.) Example– R (J, K, L)F {JK L, L K}– Two candidate keys: JK and JL– R is in 3NFJK LL KJK is a superkeyK is contained in a candidate key

Redundancy in 3NF Example of problems due to redundancy in 3NF– R (J, K, L)F {JK L, L K}JLKj1l1k1j2l1k1j3l1k1nulll2k2A schema in 3NF but not in BCNF has the following problems: redundancy of information need to use null values (e.g. to represent relationship l k ,2 2when there is no corresponding j value)

Testing for 3NF Optimization: Need to check only FDs in F, need not checkall FDs in F . Use attribute closure to check, for each dependency α β, ifα is a superkey. If α is not a superkey, we have to verify if each attribute in βis contained in a candidate key of R– this test is more expensive, since it involve finding ALLcandidate keys– testing for 3NF has been shown to be NP-hard

Canonical Cover Sets of functional dependencies may have redundantdependencies that can be inferred from the others– For example: A C is redundant in: {A B, B C}– Parts of a functional dependency may be redundant E.g.: on RHS: {A B, B C, A CD} can besimplified to{A B, B C, A D} E.g.: on LHS: {A B, B C, AC D} can besimplified to{A B, B C, A D} Intuitively, a canonical cover of F is a “minimal” set offunctional dependencies equivalent to F, having noredundant dependencies or redundant parts ofdependencies

Extraneous Attributes Consider a set F of functional dependencies and the functionaldependency α β in F.– Attribute A is extraneous in α if A αand F logically implies (F – {α β}) {(α – A) β}.– Attribute A is extraneous in β if A βand the set of functional dependencies(F – {α β}) {α (β – A)} logically implies F. Note: implication in the opposite direction is trivial in each of thecases above, since a “stronger” functional dependency alwaysimplies a weaker one Example: Given F {A C, AB C }– B is extraneous in AB C because {A C, AB C} logicallyimplies A C (I.e. the result of dropping B from AB C). Example: Given F {A C, AB CD}– C is extraneous in AB CD since AB C can be inferredeven after deleting C

3NF Decomposition/“construction” AlgorithmLet Fc be a canonical cover for F;i : 0;for each functional dependency α β in Fc doif none of the schemas Rj, 1 j i contains α βthen begini : i 1;Ri : α βendif none of the schemas Rj, 1 j i contains a candidate keyfor Rthen begini : i 1;Ri : any candidate key for R;endreturn (R1, R2, ., Ri)

Comparison of BCNF and 3NF It is always possible to decompose a relation into relations in 3NF and– the decomposition is lossless– the dependencies are preserved It is always possible to decompose a relation into relations in BCNFand– the decomposition is lossless– it may not be possible to preserve dependencies.

More Examples SUPPLY(sno,pno,jno,scity,jcity,qty)– sno,pno,jno is the candidate key,– sno scity, jno jcity ED(eno,ename,byr,sal,dno,dname,floor,mgr)1NF1NF– eno dno mgr TEACH(student,teacher,subject)– student,subject teacher– teacher subject3NF

Normalization Using FDsCheck whether a particular relation R is in “good” form: BCNF or 3NFIf not, decompose R into a set of relations {R1, R2, ., Rn} such that No redundancy: The relations Ri preferably should be in either Boyce-CoddNormal Form or Third Normal Form. Lossless-join decomposition: Otherwise you have information loss. Dependency preservation: Let Fi be the set of dependencies F that includeonly attributes in Ri.– Preferably the decomposition should be dependency preserving,that is,(F1 F2 Fn) F – Otherwise, checking during updates for violation of functionaldependencies may require expensive joins operations The theory is based on functional dependencies

BCNF and Over-normalization 3NF relation has redudancy anomalies: TEACH(student,teacher,subject)– insertion: cannot insert a teacher until we had a student taking his subject– deletion: if I delete the last student of a teacher, then I loose the subjecthe teaches What is really the problem? schema overload. We are trying to capture twomeanings:1. subject X is (or can be) taught by teacher Y2. student Z takes subject W from teacher V it makes no sense to say we loose the subject he teaches when he does nothave a student! Who does he teach to? normalizing it to BCNF cannot preserve dependencies. Therefore, it is betterto stay with the 3NF TEACH and another relation SUBJECT T(teacher,subject)3NFBCNF

Summary.practical issues Normalization– Create a good schema – low redundancy, no loss ofinformation Functional dependencies– Specify constraints that must be encoded in our schema– Note: SQL does not allow us to specify FDs other than keyconstraints (PRIMARY KEY, UNIQUE) Typical design process:– Decompose to BCNF– Use materialized views to preserve any additional FDs

1NF: every attribute has an atomic value (not a set value) 2NF: we will not be concerned in this course 3NF: if for each FD X Y either – it is trivial or – X is a superkey – Y-X is a proper subset of a candidate key BCNF: if fo

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

prepared by David Mount for the course CMSC 451, Design and Analysis of Computer Algorithms, at the University of Maryland. Permission to use, copy, modify, and distribute these notes for educational purposes and without fee is hereby granted, provided that this copyright notice appear in all copies. Lecture Notes 1 CMSC 451

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Database Applications and SQL 12 The DBMS 15 The Database 16 Personal Versus Enterprise-Class Database Systems 18 What Is Microsoft Access? 18 What Is an Enterprise-Class Database System? 19 Database Design 21 Database Design from Existing Data 21 Database Design for New Systems Development 23 Database Redesign 23

Artificial Intelligence COMP-424 Lecture notes by Alexandre Tomberg Prof. Joelle Pineau McGill University Winter 2009 Lecture notes Page 1 . I. History of AI 1. Uninformed Search Methods . Lecture notes Page 58 . Lecture notes Page 59 . Soft EM for a general Bayes net: Lecture notes Page 60 . Machine Learning: Clustering March-19-09

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .