Rules of Data Normalization Essay

pic 1.

Eliminate Repeating Groups – Make a separate table for each set ofrelated attributes, and give each table a primary key. 2. Eliminate Redundant Data – If an attribute depends on only part of amulti-valued key, remove it to a separate table. 3.

Eliminate Columns Not Dependent On Key – If attributes do notcontribute to a description of the key, remove them to a separatetable. 4. Isolate Independent Multiple Relationships – No table may contain twoor more 1:n or n:m relationships that are not directly related. 5.

Isolate Semantically Related Multiple Relationships – There may bepractical constrains on information that justify separating logicallyrelated many-to-many relationships. 6. Optimal Normal Form – a model limited to only simple (elemental)facts, as expressed in ORM. 7. Domain-Key Normal Form – a model free from all modification anomalies.

pic1. Eliminate Repeating GroupsIn the original member list, each member name is followed by any databasesthat the member has experience with. Some might know many, and others mightnot know any. To answer the question, “Who knows DB2?” we need to performan awkward scan of the list looking for references to DB2. This isinefficient and an extremely untidy way to store information. Moving the known databases into a seperate table helps a lot.

Separatingthe repeating groups of databases from the member information results infirst normal form. The MemberID in the database table matches the primarykey in the member table, providing a foreign key for relating the twotables with a join operation. Now we can answer the question by looking inthe database table for “DB2” and getting the list of members. picpic2.

Eliminate Redundant DataIn the Database Table, the primary key is made up of the MemberID and theDatabaseID. This makes sense for other attributes like “Where Learned” and”Skill Level” attributes, since they will be different for everymember/database combination. But the database name depends only on theDatabaseID. The same database name will appear redundantly every time itsassociated ID appears in the Database Table. Suppose you want to reclassify a database – give it a different DatabaseID. The change has to be made for every member that lists that database! If youmiss some, you’ll have several members with the same database underdifferent IDs.

This is an update anomaly. Or suppose the last member listing a particular database leaves the group. His records will be removed from the system, and the database will not bestored anywhere! This is a delete anomaly. To avoid these problems, we needsecond normal form.

To achieve this, separate the attributes depending on both parts of the keyfrom those depending only on the DatabaseID. This results in two tables:”Database” which gives the name for each DatabaseID, and “MemberDatabase”which lists the databases for each member. Now we can reclassify a database in a single operation: look up theDatabaseID in the “Database” table and change its name. The result willinstantly be available throughout the application.

picpic3. Eliminate Columns Not Dependent On KeyThe Member table satisfies first normal form – it contains no repeatinggroups. It satisfies second normal form – since it doesn’t have amultivalued key. But the key is MemberID, and the company name and locationdescribe only a company, not a member.

To achieve third normal form, theymust be moved into a separate table. Since they describe a company,CompanyCode becomes the key of the new “Company” table. The motivation for this is the same for second normal form: we want toavoid update and delete anomalies. For example, suppose no members from theIBM were currently stored in the database.

With the previous design, therewould be no record of its existence, even though 20 past members were fromIBM!picpic4. Isolate Independent Multiple RelationshipsThis applies primarily to key-only associative tables, and appears as aternary relationship, but has incorrectly merged 2 distinct, independentrelationships. The way this situation starts is by a business request list the one shownbelow. This could be any 2 M:M relationships from a single entity.

Forinstance, a member could know many software tools, and a software tool maybe used by many members. Also, a member could have recommended many books,and a book could be recommended by many members. picInitial business requestSo, to resolve the two M:M relationships, we know that we should resolvethem separately, and that would give us 4th normal form. But, if we were tocombine them into a single table, it might look right (it is in 3rd normalform) at first. This is shown below, and violates 4th normal form.

picIncorrect solutionTo get a picture of what is wrong, look at some sample data, shown below. The first few records look right, where Bill knows ERWin and recommends theERWin Bible for everyone to read. But something is wrong with Mary andSteve. Mary didn’t recommend a book, and Steve Doesn’t know any softwaretools. Our solution has forced us to do strange things like create dummyrecords in both Book and Software to allow the record in the association,since it is key only table.

pic Sample data from incorrect solutionThe correct solution, to cause the model to be in 4th normal form, is toensure that all M:M relationships are resolved independently if they areindeed independent, as shown below. pic Correct 4th normal formNOTE! This is not to say that ALL ternary associations are invalid. Theabove situation made it obvious that Books and Software were independentlylinked to Members. If, however, there were distinct links between allthree, such that we would be stating that “Bill recommends the ERWin Bibleas a reference for ERWin”, then separating the relationship into twoseparate associations would be incorrect.

In that case, we would lose thedistinct information about the 3-way relationship. pic5. Isolate Semantically Related Multiple RelationshipsOK, now lets modify the original business diagram and add a link betweenthe books and the software tools, indicating which books deal with whichsoftware tools, as shown below. picInitial business requestThis makes sense after the discussion on Rule 4, and again we may betempted to resolve the multiple M:M relationships into a singleassociation, which would now violate 5th normal form. The ternaryassociation looks identical to the one shown in the 4th normal formexample, and is also going to have trouble displaying the informationcorrectly. This time we would have even more trouble because we can’t showthe relationships between books and software unless we have a member tolink to, or we have to add our favorite dummy member record to allow therecord in the association table.

picIncorrect solutionThe solution, as before, is to ensure that all M:M relationships that areindependent are resolved independently, resulting in the model shown below. Now information about members and books, members and software, and booksand software are all stored independently, even though they are all verymuch semantically related. It is very tempting in many situations tocombine the multiple M:M relationships because they are so similar. Withincomplex business discussions, the lines can become blurred and the correctsolution not so obvious.

pic Correct 5th normal formpic6. Optimal Normal FormAt this point, we have done all we can with Entity-Relationship Diagrams(ERD). Most people will stop here because this is usually pretty good. However, another modeling style called Object Role Modeling (ORM) candisplay relationships that cannot be expressed in ERD.

Therefore there aremore normal forms beyond 5th. With Optimal Normal Form (OMF)It is defined as a model limited to only simple (elemental) facts, asexpressed in ORM. pic7. Domain-Key Normal FormThis level of normalization is simply a model taken to the point wherethere are no opportunities for modification anomalies. . “if every constraint on the relation is a logical consequence of thedefinition of keys and domains”.

Constraint “a rule governing static values of attributes”. Key “unique identifier of a tuple”. Domain “description of an attribute’s allowed values” 1. A relation in DK/NF has no modification anomalies, and conversely.

2. DK/NF is the ultimate normal form; there is no higher normal formrelated to modification anomalies 3. Defn: A relation is in DK/NF if every constraint on the relation is alogical consequence of the definition of keys and domains. 4.

Constraint is any rule governing static values of attributes that isprecise enough to be ascertained whether or not it is true 5. E. g. edit rules, intra-relation and inter-relation constraints,functional and multi-valued dependencies. 6.

Not including constraints on changes in data values or time-dependentconstraints. 7. Key – the unique identifier of a tuple. 8. Domain: physical and a logical description of an attributes allowedvalues.

9. Physical description is the format of an attribute. 10. Logical description is a further restriction of the values the domainis allowed11.

Logical consequence: find a constraint on keys and/or domains which,if it is enforced, means that the desired constraint is also enforced. 12. Bottom line on DK/NF: If every table has a single theme, then allfunctional dependencies will be logical consequences of keys. All datavalue constraints can them be expressed as domain constraints. 13.

Practical consequence: Since keys are enforced by the DBMS and domainsare enforced by edit checks on data input, all modification anomaliescan be avoided by just these two simple measures.

Related Topics

Need custom essay sample written special for your assignment?

Related Topics

Rules of Data Normalization Essay

Need custom essay sample written special for your assignment?

More related essays