Database Tutorial

Course Tutorial Site

Site Admin

Normalization

with 10 comments

Database normalization is the process of organizing data. There are a bunch of rules governing how you should do it, when you should undo it, and how you can’t do it. My hope is to lay out what normalization means in Texas English, which means clear and simple. I rely on you, the reader, to let me know when its not clear and how it can be more clear by treating this as a living document.

There are now, as of last year, seven or eight normal forms. They began when E.F. Codd first proposed the relational model in his A Relational Model for Large Shared Data Banks paper. There’s also a concept of Domain-Key Normal Form (DKNF). According to some, this belongs between fifth and sixth normal forms. DKNF comes to use by way of Ronald Fagin, in his A Normal Form for Relational Databases That is Based on Domains and Keys paper.

This blog page will try to cover and blend these two ideas to help gain perspective on how you may model your databases. Ultimately, if you’re very interested in the topic you should consider spending some money to buy Logic and Databases: The Roots of Relational Theory by C.J. Date.

Normalization Definitions

Summary

Database normalization attempts to organize data in such a way as to prevent SQL statements from creating insertion, update, or deletion anomalies. As a practice third normal form (3NF) is often considered normalized because most 3NF tables are free of insertion, update or delete anomalies. The key word is most, not all.

Therefore, normalization design attempts to achieve the Highest Normal Form (HNF) possible. A table is in HNF whether it meets or fails to meet any normal form definition. A Zero Normal Form ) (0NF) exists and it is equivalent to an Unnormalized Normal Form (UNF), basically a table with repeating rows. UNF is synonymous with 0NF and means that a table contains one or more repeating groups. It is probably important to note that a table may be in UNF and HNF at the same time. Likewise, UNF and HNF are equivalent to 0NF.

Normalization is the process of organizing data into tables that act as single subjects when acted upon individually or through external relationships. A single subject is also known as a domain. You act on data by querying it or transacting against it. You query data by writing a SELECT statement. You typically transact against data by writing an INSERT, UPDATE, or DELETE statement.

Terms and Definitions

Terms and definitions are very important in any topic. The normalization process has its share of terms. Here are a few that you should understand before reading this document.

Normalization Anomalies

Anomalies can occur at various levels of normalization but the majority occur when you have tables that are less than third normal form. The frequent types of anomalies are illustrated by referencing a second normal form table. You can find formal problem and solution resolution of this second normal form table later in the blog page.

Normalization Concepts

Normalization concepts show the conceptual or generalized idea and practice for normalization. The illustrations show you how to make columns atomic and to avoid repeating rows groups for first normal form. Then, they illustrate how to conceptually eliminate multiple subjects dependencies from tables beyond first normal form.

Written by michaelmclaughlin

September 27th, 2009 at 11:50 pm

Posted in

10 Responses to 'Normalization'

Subscribe to comments with RSS or TrackBack to 'Normalization'.

  1. I’m a bit confused on the definition of attribute and tuple. The two have the exact same definition, but in the definition they appear to be two different things. This is the line I am referring to, “A column is an attribute, and a row is a tuple.” Could you please help clear up my confusion on the matter? Thanks.

    Jordan

    21 Jun 10 at 9:34 pm

  2. An attribute is the mathematical definition of a placeholder in a matrix (or a table in a database). A matrix can hold 1 to many attributes in a row. In mathematics the row is actually called a tuple. The post is trying to expose you to the different ways to label elements in a table. The phrase attribute is synonymous with column, and row has the same relation to tuple.

    michaelmclaughlin

    21 Jun 10 at 9:39 pm

  3. I loved the definition table – it would have been helpful the first day of class. I do have one suggestion though- don’t use the term you are trying to define in a defination (i.e. – primary key).

    Chris Staber

    20 Sep 10 at 12:17 pm

  4. I’ll add a table by itself to Lesson #1. Thanks!

    michaelmclaughlin

    3 Oct 10 at 2:46 pm

  5. This page states that there is “no zero normal form”. Is that correct? It seems that we have spoken of 0NF and UNF being the same thing in class.

    Hyrum Denney

    26 Nov 10 at 4:16 pm

  6. Under “Insertion Anomalies” you have written a word twice in this sentence *”The table includes includes two subjects.” I also agree with Chris, the definitions are awesome and I would have liked to have them earlier.

    Sam Brubaker

    2 Feb 11 at 11:34 pm

  7. Under the Normalization Definitions Summary section, it is stated that there is no zero normalized form when there actually is.

    Tyler Hull

    18 Mar 11 at 10:59 am

  8. Perhaps to keep things simpler in your blog you could put some of the extraneous details as footnotes. For example, the E. F. Codd reference.

    Eli

    4 May 11 at 8:28 pm

  9. I think that there needs to be a slight modification to the “item” table in the Setup Script for 3NF. The script shows that you’ve setup a new third table called “item_rating” with the attributes item_rating_id (primary key), item_rating, and item_rating_agency. The latter two attributes needed to be removed from the “item” table in order to achieve third normal form (previously they were transitive dependencies in the base table). You’ve then altered the ‘item’ table and set a foreign key on the item_rating_id, but the original transitive dependencies are still in the ‘item’ table after everything has been shifted around for 3NF.

    Tighe

    29 Jan 13 at 12:00 pm

  10. Hyrum, Thanks for pointing it out. It’s fixed now, and yes there’s a 0NF, which is basically a table with repeating rows.

    michaelmclaughlin

    25 Oct 14 at 6:40 pm

Leave a Reply