Unit 5: Normal Forms Based on Functional Dependencies IT238: Data Modeling and Database Design Instructor: Qing Yan, M.D., Ph.D.

April 29, 2018 | Author: Anonymous | Category: Documents
Report this link


Description

Unit 5: Normal Forms Based on Functional Dependencies IT238: Data Modeling and Database Design Instructor: Qing Yan, M.D., Ph.D. Week 3 Review: Assignment Chapter 8 – Normal Forms Based on Functional Dependencies * Chapter 3 – Entity-Relationship Modeling * Notation Figure 3.2 Summary of Presentation Layer ER diagram notation From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 3 – Entity-Relationship Modeling * Figure 3.2 Summary of Presentation Layer ER diagram notation (continued) Notation (continued) Noninclusive Arc Inclusive Arc Exclusive Arc R E2 E1 E2 R R m n Cardinality Ratio of m:n Chen's Notation E1 Optionally Related To E2 (Partial Participation of E1 in R) E2 Mandatorily Related to E1 (Total Participation of E2 in R) [Existence Dependency of E2 in R] Cardinality Ratio of m:n Crow's Foot Notation E1 E2 E1 Chapter 3 – Entity-Relationship Modeling * The Synthesis Approach to Generate an Initial Set of Entity Types and Attributes List all discernable data elements and treat them as attributes Group these attributes based on apparent commonalities Designate each cluster of attributes as an entity type Review leftover data elements and investigate the possibility that some of them serve as links among the entity types previously identified Designate these links as relationship types From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 3 – Entity-Relationship Modeling * The Analysis Approach to Generate an Initial Set of Entity Types and Attributes Begin by searching for things that can be labeled by singular nouns and call these things entity types Gather properties that appear to belong to individual entity types and label them as attributes of a particular entity type Be sensitive throughout the process to the identification of relationships among the various entity types From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Week 3 Assignment Rubric Chapter 8 – Normal Forms Based on Functional Dependencies * Week 3 Assignment Answer Chapter 8 – Normal Forms Based on Functional Dependencies * Week 4 Key Concepts Relational schema A set of relation schemas in a relational data model. Relation state A specific occurrence of a relation (or should we say relation schema). Superkey A set of one or more attributes, which taken collectively, uniquely identifies a tuple of a relation. Superset A set S1 is a superset of another set S2 if every element in S2 is in S1. S1 may have elements that are not in S2. Chapter 8 – Normal Forms Based on Functional Dependencies * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Week 4 Key Concepts Candidate key A superkey with no proper subsets that are superkeys. Uniqueness property The prohibition of the presence of identical values for the collection of attribute(s) that constitute a unique identifier. Primary key Chosen from among the candidate keys to serve the role of uniquely identifying tuples of a relation schema. Chapter 8 – Normal Forms Based on Functional Dependencies * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Week 4 Assignment 1. What is a relation schema? What is the difference between a relation, a relation schema, and a relational schema? A relation schema: the basic structure of a relation. It consists of a name, and a set of related attributes. When sets of values are assigned to the attributes that make up a relation schema, a relation is created. A relation is a relation schema with data. A relational schema is a set of relation schemas. A relational schema is also called a relational data model. Chapter 8 – Normal Forms Based on Functional Dependencies * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Unit 5 Objectives Apply the process of normalization to resolve data redundancy problems in a relation schema Demonstrate the need for dependency preservation during the normalization process Chapter 1 – Database Systems: Architecture and Components * Unit 5 To-Do List Complete the reading Textbook and Web Participate in the discussion board 30 points Attend the introductory seminar or complete FLA quiz 20 points Complete the unit assignment 50 points Chapter 1 – Database Systems: Architecture and Components * Unit 5 Assignment Rubric Chapter 8 – Normal Forms Based on Functional Dependencies * Key Concepts Normalization The process of rendering undesirable functional dependencies desirable. A technique to facilitate systematic validation of the participation of attributes in a relation schema from a perspective of data redundancy. First, second, third and Boyce-Codd normal forms. Chapter 1 – Database Systems: Architecture and Components * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Key Concepts Normalization in the database design process To render undesirable functional dependencies desirable. Chapter 8 – Normal Forms Based on Functional Dependencies * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Key Concepts Functional dependencies – the source technical translations of user-specified business rules expressed as constraints in a relation schema. Chapter 8 – Normal Forms Based on Functional Dependencies * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Key Concepts First normal form A relation schema R where the attributes comprising the schema are atomic and single-valued. Second normal form A relation schema R where every non-prime attribute in R is fully functionally dependent on the primary key of R. Third normal form A relation schema R where no non-prime attribute is functionally dependent on another non-prime attribute in R. Chapter 1 – Database Systems: Architecture and Components * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Key Concepts Boyce-Codd normal form A relation schema R such that for every non-trivial functional dependency in R, the determinant is a superkey of R. Chapter 1 – Database Systems: Architecture and Components * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Key Concepts Reverse engineering The process of going from a lower level of abstraction (e.g., a normalized relational schema) to a higher level of abstraction (e.g., a Design-specific ER diagram). Chapter 8 – Normal Forms Based on Functional Dependencies * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Key Concepts The difference between a desirable and an undesirable functional dependency A desirable functional dependency is any functional dependency in a relation schema R where the determinant is a candidate key of R. Undesirable functional dependencies in a relation schema occur when the determinant of the functional dependency is not a candidate key of R. Functional dependencies of this nature are the seeds of data redundancies that lead to modification anomalies. Chapter 8 – Normal Forms Based on Functional Dependencies * From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Normalization Normalization is a technique that facilitates systematic validation of the participation of attributes in a relation schema from a perspective of data redundancy. Normal Forms (NFs) provide a stepwise progression towards attaining the goal of a fully normalized relation schema that is guaranteed to be free of data redundancies that cause update anomalies from a functional dependency perspective. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Desirable Versus Undesirable FDs Desirable FDs in a relation schema R are those where the determinant is a candidate key of R. Undesirable FDs in a relation schema R are those where the determinant is not a candidate key of R. That is, the FDs will cause data redundancy and the consequent modification anomalies in R. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Normal Forms: An Overview A relation schema is said to be in a particular normal form if it satisfies certain prescribed criteria for that normal form. First normal form (1NF) reflects one of the properties of a relation schema – i.e., by definition a relation schema is in 1NF. The normal forms associated with functional dependencies are second (2NF), third (3NF), and Boyce-Codd (BCNF) normal forms. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Normal Forms: An Overview (continued) The violations of each of these normal forms signal the presence of a specific type of ‘undesirable’ FD. violation of a normal form, can be interpreted as equivalent to an inadvertent mixing up of entity types belonging to two different entity classes in a single entity type. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * First Normal Form (1NF) First normal form (1NF) imposes conditions so that a base relation that is physically stored as a file does not contain records with a variable number of fields. This is accomplished by prohibiting multi-valued attributes, composite attributes, and combinations thereof in a relation schema. Such a constraint, in effect, prevents relations from containing other relations. In essence, 1NF, by definition, requires that the domain of an attribute must include only atomic values and that the value of an attribute in a relation’s tuple must be a single value from the domain of that attribute. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * 1NF Violation – An Example Note 1: Album_no is the primary key of ALBUM Note 2: Artist_nm is a multi-valued attribute causing a first normal form violation. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. ALBUM Album_no Artist_nm Price Stock BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200 BTL007 {John Lennon, Paul McCartney, George Harrison, Ringo Star} 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 {John Denver, Michael  Jackson, Madonna} 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95 Chapter 8 – Normal Forms Based on Functional Dependencies * Resolution of 1NF Violation From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. NEW_ALBUM Album_no Artist_nm Price Stock BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200 BTL007 John Lennon 23.95 BTL007 Paul McCartney 23.95 BTL007 George Harrison 23.95 BTL007 Ringo Star 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 John Denver 11.95 2000 DJM237 Michael Jackson 11.95 2000 DJM237 Madonna 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95 Note 1: Artist_nm is no longer a multi-valued attribute Note 2: {Album_no, Artist_nm} is the primary key of ALBUM thus rendering Artist_nm a single-valued attribute and achieving 1NF in NEW_ALBUM. Chapter 8 – Normal Forms Based on Functional Dependencies * Second Normal Form (2NF) A relation schema R is in 2NF if every non-prime attribute in R is fully functionally dependent on the primary key of R. This means a non-prime attribute is not functionally dependent on a proper subset of the primary key of R. The Second Normal Form (2NF) is based on a concept known as full functional dependency. A functional dependency of the form Z  A is a ‘full functional dependency’ if and only if no proper subset of Z functionally determines A. In other words, if Z  A and X  A, and X is a proper subset of Z, then Z does not fully functionally determine A, i.e., Z  A is not a full functional dependency; it is a partial dependency. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Violation of 2NF Note: The primary key and the attributes A, Y, and Z can be atomic or composite. A is a prime attribute, whereas Y and Z are non-prime attributes. In order for a partial dependency to exist here, Attribute A must be a proper subset of the primary key of R1. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Entity name Data item Attribute A Primary Key Partial dependency (Violation of 2NF in R1) Attribute Z Attribute Y R1 Chapter 8 – Normal Forms Based on Functional Dependencies * 2NF Violation - An Example What is the cause of the 2NF violation in NEW_ALBUM? From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. NEW_ALBUM Album_no Artist_nm Price Stock BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200 BTL007 John Lennon 23.95 BTL007 Paul McCartney 23.95 BTL007 George Harrison 23.95 BTL007 Ringo Star 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 John Denver 11.95 2000 DJM237 Michael Jackson 11.95 2000 DJM237 Madonna 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95 Chapter 8 – Normal Forms Based on Functional Dependencies * 2NF Violation Explained F: fd1: Album_no  Price; fd2: Album_no  Stock F+: F; fd12: Album_no  {Price, Stock}; fd12x: {Album_no, Artist_nm}  {Price, Stock} Fc = F Candidate Key of NEW_ALBUM: (Album_no, Artist_nm); Primary Key: (Album_no, Artist_nm) fd1 and fd2 (fd12) violate 2NF in NEW_ALBUM From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Modification Anomalies Resulting from 2NF Violation Suppose we want to change the value of Price or Stock of Album_no BTL007 in NEW_ALBUM Multipe tuples require update and failure to update some will change the semantics of the scenario  update anomaly Suppose we want to add a new tuple (Album_no: XY111, Price: 17.95, and Stock: 100) to NEW_ALBUM Unless we also know the artist(s) the insert is not possible since Artist_nm is part of the primary key of NEW_ALBUM  insertion anomaly Suppose we want to delete Album_no BTL007 from NEW_ALBUM Entails deletion of multiple tuples  deletion anomaly From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Resolution of 2NF Violation The resolution of 2NF violation is a two-step process that decomposes the target relation schema with the undesirable FDs into multiple relation schemas such that the undesirable FDs are rendered desirable. Pull out the undesirable FD(s) from the target relation schema as separate relation schema(s). Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Resolution of 2NF Violation Demonstrated Note: Album_no  Price Album_no  Stock Note: No non-trivial FD present From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. ALBUM_INFO ALBUM_ARTIST Album_no Price Stock Album_no Artist_nm BS123 17.95 1000 BS123 Britney Spears JT111 17.95 1200 JT111 Justin Timberlake BTL007 23.95 BTL007 John Lennon MJ100 17.95 BTL007 Paul McCartney JM456 16.95 1000 BTL007 George Harrison JM151 16.95 1000 BTL007 Ringo Star MX789 11.95 500 MJ100 Michael Jackson DJM237 11.95 2000 JM456 John Mayer DR711 12.95 1000 JM151 John Mayer PM137 19.95 MX789 Madonna DJM237 John Denver DJM237 Michael Jackson DJM237 Madonna DR711 Diana Ross PM137 Paul McCartney Chapter 8 – Normal Forms Based on Functional Dependencies * Third Normal Form – 3NF A relation schema R is in 3NF if it is in 2NF and no non-prime attribute is functionally dependent on another non-prime attribute in R. The Third Normal Form (3NF) is based on the concept of transitive dependency. Given a relation schema R (X, A, B) where X, A, and B are pair-wise disjoint atomic or composite attributes, X is the primary key of R, and A and B are non-prime attributes If A  B (or B  A) in R, then B (or A) is said to be ‘transitively dependent’ on X, the primary key of R. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * BCNF Defined A relation schema R is in BCNF if for every non-trivial functional dependency in R, the determinant is a superkey of R Remember: An FD in R is trivial if and only if the dependent is a subset of the determinant. Note: By this definition, violation of 2NF or 3NF also imply violation of BCNF From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Possible Side Effects of Normalization Attribute Preservation i.e., no attribute should be lost in the decomposition process. Dependency Preservation i.e., any decomposition should continue to preserve the minimal cover of F. Lossless-Join (Non-additive or Non-loss Join) Property i.e., the decomposition should be strictly reversible in that the reversal should strictly yield the original target relation in tact with no loss of tuples or addition of spurious tuples. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Lossless-Join Explained A decomposition of a relation schema R should be strictly (losslessly) reversible. The term “loss” in “lossless” implies loss of information – not necessarily loss of tuples generation of spurious tuples resulting from the natural join of the projections of R entails loss of information Lossless-join property is always predicated on: a set of FDs (F) prevailing over R the premise that the join attributes in the decomposition are non-null values Lossless Join is also referred to as Non-additive or Non-loss Join From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Normal Forms in a Nutshell 1NF 2NF 3NF BCNF Modification Anomalies Yes Yes Yes No (due to FDs) Lossless Join N/A Yes Yes Yes Decomposition Dependency Yes Yes Yes Yes when* Preservation on decomposition * BCNF design that result from 2NF and 3NF resolutions requiring no further normalization. From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Summary Notes on Normalization Normalization eliminates data redundancies that cause modification anomalies by appropriately decomposing the target relation schema. Undesirable FDs in the target relation schema are not discarded, but are rendered “desirable” in the decomposed relational schema. There is no specific merit in resolving 2NF violation before 3NF violation – the ordering is only historical. BCNF subsume 2NF and 3NF in that 2NF and 3NF violations are also BCNF violations. Likewise, a relation schema in BCNF is also in 2NF and 3NF. A BCNF solution that also has the lossless join property and is dependency preserving may not be achievable at all times. In this case a lossless join is preferred and Dependency preservation is accomplished via application programs or materialized views From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Normalization Step 1: Identify the candidate keys of URS1, given F  (Store, Branch, Customer, Vendor) and (Manager, Customer, Vendor) Step 2: Choose a primary key  (Manager, Customer, Vendor) Step 3: Identify violations of normal forms for every FD in F Step 4: Resolve 2NF and 3NF violations and decompose Step 5: Resolve BCNF violations and decompose From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Chapter 8 – Normal Forms Based on Functional Dependencies * Normalization (continued) Step 6: Is the decomposition attribute preserving? Is the decomposition dependency preserving? Does the decomposition exhibit lossless-join property? Cross-check the decomposition! From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology. Summary The process of normalization to resolve data redundancy problems in a relation schema The need for dependency preservation during the normalization process Q & A Questions? * * * * * * * * * * * * * * * * * * * * * * * *


Comments

Copyright © 2025 UPDOCS Inc.