Class 20: Decomposition & Schema Normalization

2y ago
5 Views
2 Downloads
545.37 KB
48 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Emanuel Batten
Transcription

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisCS460: Intro to Database SystemsClass 20: Decomposition &Schema NormalizationInstructor: Manos Athanassoulishttps://bu-disc.github.io/CS460/

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisReview: Database DesignRequirements Analysisuser needs; what must database do?Conceptual Designhigh level description (often done w/ ER model)Logical Designtranslate ER into DBMS data modelSchema Refinementconsistency, normalizationPhysical Designindexes, disk layout2

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisWhy schema refinementwhat is a bad schema?a schema with redundancy!why?redundant storage & insert/update/delete anomalieshow to fix it?normalize the schema by decomposingnormal forms: BCNF, 3NF, 3

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisMotivating SN Name, Salary4

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisMotivating SN Name, -555-3761787-00-4321John25K5

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisMotivating Example lack600phonesLenovo onesilver450phonesname, category price, colorcategory department6

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisMotivating Example lack600phonesLenovo ovo onephonesOnePlussmartphonesilver4507

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisReminder: Reasoning for FDsAssume a relation R with attributes A, B, C(1) reflexivitye.g., A, B B(2) augmentatione.g., if A B then A, C B, C(3) transitivitye.g., if A B and B C then A C(4) unione.g., if A B and A C then A B, C(5) decompositione.g., if A B, C then A B and A CFD closure of F, F : is the set of all FDs that are implied by Fattr. closure of X: the set of all attributes that are determined by Xminimal cover: subset S of F such that S F 8

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis“chopping the relation into pieces using FDs”DECOMPOSITION9

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisDecompositionFormallywe decompose R(A1, , An) by creating:R1(B1, , Bm)R2(C1, , Ck)where {B1, , Bm} {C1, , Ck} {A1, , An}the instance of R1 is the projection of R onto B1, , Bmthe instance of R2 is the projection of R onto C1, , Ck10

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis“Good” Decomposition(1) minimize redundancy(2) avoid information loss (lossless-join)(3) preserve FDs (dependency preserving)(4) ensure good query performance12

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisInformation 0-4321John25KDecompose into:R1(SSN, Name, Salary)R2(Name, -8800Anna617-555-9876John617-555-3761can wereconstruct R?13

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisLossless r (join on B)R’(A,B,C)the decomposition is lossless-join iffor any initial instance R, R R’14

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisLossless Criteriongiven: a relation R(A) a set F of FDs a decomposition of R into R1(A1) and R2(A2)the decomposition is lossless-join if and only ifat least one of the following FDs is in F (closure of F):(1) A1 A2 A1(2) A1 A2 A215

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisExampleA A A B A CRelation R(A, B, C, D)FD A B, Cwhat is the F ?lossydecomposition into R1(A, B, C) and R2(D)A B,C A A,B A A,CA A,B,CA1 A2 empty setlossless-joindecomposition into R1(A, B, C) and R2(A, D)A1 A2 A and A1 A,B,CA A,B,C is in F 16

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisDependency Preservinggiven R and a set of FDs F, we decomposeR into R1 and R2. Suppose:R1 has a set of FDs F1R2 has a set of FDs F2F1 and F2 are computed from Fit is dependency preserving if by enforcingF1 over R1 and F2 over R2 , we can enforce F over R17

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis(Good) ExamplePerson (SSN, name, age, canDrink)SSN name, ageage canDrinkwhat is a dependency preserving decomposition?R1(SSN, name, age)SSN name, ageandR2(age, canDrink)age canDrinkIs it also lossless-join?Yes! A1 A2 age and A2 age, canDrinkage age, canDrink is in F 18

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis(Bad) ExampleR (A, B, C)A BB, C Anot dependency preservingR1(A, B)A BandR2(A, C)no FDs!ABACa1b0a1c0a2b0a2c0ABCa1b0c0a2b0c0the table violatesB, C A19

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisNormal FormsHow “good” is a schema design?follows normal formsflat tablesatomic values1NF2NF3NFBCNF4NF morerestrictive20

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisNormal FormsHow “good” is a schema design?follows normal formsflat tablesatomic values1NF2NF3NFBCNF4NF morerestrictive21

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBoyce-Codd Normal Form (BCNF)given a relation R(A1, ,An),a set of FDs F, and X {A1, ,An}R is in BCNF if X A one of the two holds:- A X (that is, it is a trivial FD)- X is a superkeyin other words: non-trivial FD X A, X is a superkey in R22

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF - SN Name, Salarykey: {SSN, Telephone}FD is not trivial!so, is SSN a superkey?no! it is not in BCNF23

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF - Example 787-00-4321John25KSSN Name, Salarykey: {SSN}FD is not trivial!so, is SSN a superkey?yes! it is in BCNF24

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF - Example -3761key: {SSN, Telephone} the relation is in BCNFwhy?no FDsIs it possible a binary relationto not be in BCNF?25

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBinary Relations always BCNFR (A,B)excluding all trivial FDs, there are three cases:(1) R has no FD(2) R has one FD, either A B or B A, or,(3) R has two FDs, A B and B A(1) trivially in BCNF(2) in either LHS is the key (hence, superkey)(3) both, A and B candidate keys26

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF Decomposition Algorithm(1) find a FD that violates BCNF:A1, , An B1, , Bm(2) decompose R to R1 and R2R1(A1, , An, B1, , Bm)R2(A1, , An, all other attributes)(3) repeat until no BCNF violations are left(in new tables as well)27

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisOur favorite SSN Name, Salary violates BCNFA1 SSN, B1 Name, B2 SalarySplit in two relations:R1(SSN, Name, Salary)R2(SSN, Telephone)28

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisOur favorite 761787-00-4321John25K29

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF Decomposition Propertiesremoves [certain types of] redundancyis lossless-joinis not always dependency preserving30

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF – Lossless JoinExampleR (A, B, C) and FD: A Bsuperkey(s) of the relation?{A, C} , {A, B, C} {A, B, C}A B violates BCNF (A is not a superkey)so, the BCNF decomposition is :R1(A, B) and R2(A, C)we can reconstruct it!31

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF – not dependency preservingExampleR (A, B, C), FDs: A B and B, C Asuperkey(s) of the relation?{A, C} , {B, C} , {A, B, C} {A, B, C}B, C A is ok, but A B violates BCNFso, the BCNF decomposition is :R1 (A, B) and R2 (A, C)A B is preserved in R1B, C A is not preserved!32

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF Decomposition ExamplesBooks (author, gender, booktitle, genre, price)author genderbooktitle genre, pricecandidate key(s)?{author, booktitle} is the only oneIs it in BCNF?No, because LHS of both FD are not a superkey!33

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF Decomposition ExamplesBooks (author, gender, booktitle, genre, price)author genderbooktitle genre, priceSplitting using: author genderAuthorInfo (author, gender)FD author gender (in BCNF!)Book2 (author, booktitle, genre, price)FD booktitle genre, priceNo! {booktitle, author} is.is booktitle a superkey?So not in BCNF!34

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisBCNF Decomposition ExamplesBooks (author, gender, booktitle, genre, price)author genderbooktitle genre, priceAuthorInfo (author, gender)Further splitting with booktitle genre, priceBook2 (author, booktitle, genre, price)BookInfo (booktitle, genre, price)FD booktitle genre, price in BCNF!is booktitle a superkey?Yes!BookAuthor (booktitle, author) binary is in BCNF!35

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassouliswhat if not dependency preserving?in some cases BCNF decomposition is not dependency preservinghow to address this?relax the normalization requirements36

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisThird Normal Form (3NF)given a relation R (A1, ,An),a set of FDs F, and X {A1, ,An}R is in 3NF if X A one of the three holds:- A X (that is, it is a trivial FD)- X is a superkey- A is part of some key for Ris a relation in 3NF also in BCNF?No, but a relation in BCNF is always in 3NF!37

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisThird Normal Form (3NF)ExampleR (A, B, C), FDs C A and A, B Cis in 3NF but not in BCNF. Why?superkeys?{A, B}, {B, C}, and {A, B, C}candidate keys?{A, B} and {B, C}Compromise: aim for BCNF but settle for 3NFlossless-join & dependency preserving possible38

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis3NF Algorithm(1) apply BCNF until all relations are in 3NF(2) compute a minimal cover F’ of F(3) for each non-preserved FD X A in F’add a new relation R (X, A)39

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis3NF algorithm exampleAssume R (A, B, C, D)A DA, B CA, D CB CD A, Bsuperkeys?{A} {D} {A, B} {A, D}, not {B}Step 1: find a BCNF decompositionR1 (B, C)R2 (A, B, D)40

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis3NF algorithm exampleAssume R (A, B, C, D)A DA, B CA, D CB CD A, BStep 2: find a minimal coverA DB CD AD B41

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis3NF algorithm exampleAssume R (A, B, C, D)A DA, B CA, D CB CD A, BStep 3: add a new relation for not preserved FDsA DR1 (B, C)B CR2 (A, B, D)D Aall FD are preserved!D Bboth are in BCNF!42

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisIs Normalization Always Good?Example 1: suppose A and B are always used together, but normalizationputs them in different tables (e.g., hours worked and hourly rate)decomposition might produce unacceptable performance lossExample 2: data warehouseshuge historical DBs, rarely updated after creationjoins expensive or impractical[we want “flat” tables, a.k.a, denormalized]43

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisExampleR (C, S, J, D, P, Q, V)C S, J, D, P, Q, VJ, P CS, D PJ SStep 1:R1 (S, D, P)R2 (C, S, J, D, Q, V)superkeys?{C}, {J, P}, {D, J}, not {S, D}

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisExampleR (C, S, J, D, P, Q, V)C S, J, D, P, Q, VJ, P CS, D PJ SStep 1b:R1 (S, D, P)R2’ (J, S)R3 (C, J, D, Q, V)superkeys of R2 (C, S, J, D, Q, V) ?{C}, not {J}

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisExampleR (C, S, J, D, P, Q, V)C S, J, D, P, Q, VJ, P CS, D PJ SStep 2: Minimal CoverC J, C D, C Q, C VJ, P CS, D PJ SR1 (S, D, P)R2’ (J, S)R3 (C, J, D, Q, V)R4 (J, P, C)are they all preserved?No!Step 3: need to add R4 (J, P, C)

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisExampleR (C, S, J, D, P, Q, V)C S, J, D, P, Q, VJ, P CS, D PJ SStep 2: Minimal CoverC J, C D, C Q, C VJ, P CS, D PJ SR1 (S, D, P)R2’ (J, S)R3 (C, J, D, Q, V)R4 (J, P, C)are they all preserved?No!Step 3: need to add R4 (J, P, C)did we just introduce redundancy?

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisLesson!theory of normalization is a guidecannot always give a “perfect” solutionredundancyalternativesquery performance

CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos AthanassoulisSummaryfix bad schemas (redundancy) by decompositionlossless-joindependency preservingDesired normal formsBCNF: only superkey FDs3NF: superkey FDs dependencies to prime attributes in RHSNext: transaction management49

1NF 2NF 3NF BCNF 4NF . BCNF -Example 3 key: {SSN, Telephone} the relation is in BCNF why? no FDs 25 SSN Telephone 987-00-8761 857-555-1234 987-00-8761 857-555-8800 123-00-9876 617-555-9876 787-00-4321 617-555-3761 Is it poss

Related Documents:

HS-PORTAL 150 Hebe-Schiebe-Türbeschlag für Holzelemente Schema-Übersicht und allgemeine Hinweise 3.1.2 Ausführbar mit Führungsschiene HH0130-01/-02 KH0130-01 Schema A Schema D Schema G Schema G-2 Schema G-3 Schema H Schema C Schema F Schema K Schema E Schema L Achtung: Die für den SIEGENIA-AUBI-Beschlag HS-PORTAL 150 angegebenen .

2.1. Singular Value Decomposition The decomposition of singular values plays a crucial role in various fields of numerical linear algebra. Due to the properties and efficiency of this decomposition, many modern algorithms and methods are based on singular value decomposition. Singular Value Decomposition of an

support schema evolution [43] to handle data whose structure . by storing FSD as one object without relying on any static schema & E/R model to decompose FSD into relational tables. That is, no schema on write . Flexible schema that is . schema ba

The totality of these behaviors is the graph schema. Drawing a graph schema . The best way to represent a graph schema is, of course, a graph. This is how the graph schema looks for the classic Tinkerpop graph. Figure 2: Example graph schema shown as a property graph . The graph schema is pretty much a property-graph.

A mission decomposition diagram is built to represent a hierarchical structure among the mission tasks. Different decomposition techniques (e.g., functional decomposition, goal decomposition, terrain decomposition) generally re

Summary of Schema Decomposition Use a three-step approach for solving the schema decomposition problem using UFOs. Step 1: Filtering – Filter out irrelevant famous objects based on high-level signatures of schema and UFOs Step 2: Similarity Score Computation – Ca

Our framework of schema management is designed for docu-ment stores, which includes three components as shown in Fig. 2, schema extraction and discovery component, repository compo-nent, and schema consuming component with two functions of Schema Extraction and Discovery This component provi

developers to understand and analyze schema evolution in schema-less NoSQL data stores. Our approach, summarized in Figure 3, is made up of three phases, namely schema . find a particular author based on a given identifier; (2) line 5 . authorQuery object. By analyzing the usage flow of this given