1NF Modeling at Database Tutorial

1NF Modeling

without comments

Week 3: Reading

The following are the reading articles:

Definitions
Data Modeling
- 1NF Modeling
- 2NF Modeling
- 3NF Modeling
Common Lookup
Who Audit Security

1^st Normal Form (1NF)

Learning Outcomes

Learn how to define and apply First Normal Form (1NF).

Lesson Materials

First Normal Form (1NF) exists when all columns, or attributes, of a table have a single data type and there are no repeating rows of data, which means rows are unique. This requires that:

Column data types should be atomic, which means columns shouldn’t have repeating groups, comma delimited groups, or other subatomic parts. This raises a question whether compund variables like Oracle’s arrays, lists, and objects violate first normal form.
Column names are unique in tables and arbitrarily ordered, which means their order doesn’t impose any constraint on the table.
Rows in a table have no implicit or explicit ordering required for their access and use.
Columns shouldn’t contain a repeating group in the context of a candidate key. This means that when you repeat all columns but one or a subset of columns, your model fails to meet the definition of first normal form.

Moreover, first normal form modeling finds the non-repeating rows and moves them from a base table to their own table. It removes all trace of the non-repeating rows from the base table. It also puts a foreign key in the new table that points back to the primary key of the base table. The foreign key in the new table is functionally dependent on the primary key in the base table.

What are atomic data types? →

The question of what constitutes an atomic value in first normal form is interesting because Oracle’s database engine runs as an Object Relational Database Management System (ORDBMS); also known as Extended Relational Database Management System. Are atomic column values limited to scalar variables? Scalar variables only hold one thing at a time, like an Integer.

If so, this would exclude Oracles’ specialized types. An alternative may ask that question differently, like “Are atomic column values scalar or compound variables?” Compound variables could be a comma separated list in a string data type, or either a SQL lists or Oracle’s User Defined Type (UDT).

The idea of comma separate lists can be dismissed as violating first normal form but the others aren’t so easy to toss out. While it might be heresy, I’d accept Oracle SQL lists or Oracle’s UDT (stored as object constructors with actual parameters) for two reasons. They are atomic in the scope of the database engine and most effectively transferred to active status inside a run-time view, which is then joined or acted upon by ordinary SQL commands. On the contrary point, lists are invariably ordered and that means they can’t by themselves be in first normal form. They become navigable ordered structures that you place into an aggregate table with a TABLE function call. Does the structured ordering of inside an atomic type disqualify it from being in first normal form? I’d argue that the nature of the thing stored as no effect on how it is accessed by SQL, and therefore it meets first normal form regardless of how you navigate it once you query it.

Examples & Solutions: →

A table is in first normal form when all columns, or attributes, have a single data type and there are no repeating rows or repeating groups. The first two elements of this idea are generally understood and applied almost intuitively. The idea of a single data type is clear because most people understand any column can only have one data type, at any given time. The idea of uniqueness makes sense because you generally only want to act on one copy of anything at a time.

The third point is the one that is sometimes misunderstood. The idea of not being able to have repeating groups can confuse people because they often look at tables as a group of unique rows. Repeating groups indicate that you’ve got a multiple subject table design. The repeating rows indicate the base table is on the one side of a one-to-many relationship. The columns with non-repeating rows typically become a new table because they are on the many side of a one-to-many binary relationship.

Problem Description →

The example in this section uses a sample address table to demonstrate how you move a table from UNF, or a HNF of nothing, to 1NF. This is the data definition of a version of the address table. It has a design that lets you violate first normal form whenever the street_address column requires more than a single entry or more than one row.

unfproblem

This type of table design lets you violate the atomic rule of first normal form because the street_address column may need to accommodate one or more street address values. Typically, many developers see no harm in putting multiple address lines into a column as comma separated values. The data would look like the following:

Another type of table design lets you violate two rules of first normal form. It modifies the previous example by adding a second column to the primary key. This makes the primary key a compound or composite key. The following provides an example of such a table.

unfproblemdesc2

The second column allows for multiple rows to have an internal relationship. The second column in the primary key lets you order rows. The ordering of rows violates the first normal form rule that there will be no implicit or explicit ordering of rows. This type of design lets all columns except the ordering column of the compound primary key and street_address column repeat. It violates the no repeating group rule of first normal form. The data set is also a mirror to what you see when joining two tables that have a one-to-many relationship. The columns on the one-side repeat while the columns on the many-side are unique.

unfproblemdesc3

In the preceding illustration the non-repeating columns and rows are highlighted. They illustrate that together they’re really a separate subject that should be removed to another table. The solution section covers approaches that resolve these design errors.

Problem Solution →

Solving first normal form problems can be a challenge. Sometimes it is more difficult than second or third normal form because at first normal form there are options. The conceptual solutions are: (1) remove the columns and put them in another table; (2) flatten the table, which is a form of denomalization; and (3) in those databases where they are supported implement a nested table or list.

Relational Solution →

Removing columns to another table is the best solution and practice because it offers you the most future flexibility. This is the frequently recommended alternative for online transaction processing (OLTP) systems.

1stnfsolved1

If you query the data with an INNER JOIN, the data will look very similar to the original problem data. This query demonstrates the problem.

SELECT   a.contact_id
,        a.address_id
,        sa.address_line_id
,        sa.street_address
,        a.city
,        a.state_province
,        a.postal_code
FROM     address a INNER JOIN street_address sa
ON       a.address_id = sa.address_id;

It generates the following as a result from the join. You should note that the one-side of the one-to-many binary relationship is repeating while the many side is unique. The process of repeating the result in the query return set is somewhat like but different than a Cartesian Product. It displays the projected columns of the join in each row, which means the one-side repeats in the return set. You need to perform a nested join to keep only a unique set of the one side in a procedural programming language. As a side note, it’s unfortunate that SQL doesn’t support a FLOWR statement like XQuery because it would eliminate the need of a procedure programming language.

unfproblemdesc3

Flattening (Data Warehousing) Solution →

Flattening a non-repeating column into a series of columns requires creating a column for every unique row that may occur. The following is a definition of such a table:

unfproblemdefsol11

The data in a flattened model appear like this:

unfproblemsol1

Object Relational Solution →

Implement a non-repeating column as a list. You first create a scalar collection type, and then implement the collection in an address table. You create a SQL scalar collection type of variable length strings with the following syntax:

CREATE TYPE street_list IS TABLE OF VARCHAR2(30);
/

After you create the type, you can implement it as a column data type. This particular SQL data type is not an array but a list. A list requires a specialized clause for storage of the nested table, as shown in the following create table statement.

CREATE TABLE address_book
( address_book_id NUMBER
, street_address  STREET_LIST
, city            VARCHAR2(30)
, state           VARCHAR2(2)
, postal_code     VARCHAR2(10))
NESTED TABLE street_address STORE AS street_table;

The table description does not qualify the specialized storage instructions. You have to inspect the data catalog information to see how the database manages these.

1stnfsolved3

You can insert rows of data but you must use constructors to insert into a nested table structure. The following enters two rows of data:

INSERT INTO address VALUES
( 1, street_list('203 North La Salle','Suite 1400'), 'Chicago', 'IL', '60601');
INSERT INTO address VALUES
( 2, street_list('203 North La Salle','Suite 1800'), 'Chicago', 'IL', '60601');

A query of the data in the table yields:

1stnfsolvedlist

You can find more about joining and querying nested tables in this prior blog post.

Setup Scripts →

The scripts for first normal form are more numerous than others. You’ll find them preceded by brief descriptions, and should consider working through them sequentially.

UNF address table with a non-atomic street_address column →

-- Conditionally drop the table.
BEGIN
  FOR i IN (SELECT   TABLE_NAME
            FROM     user_tables
            WHERE    TABLE_NAME IN ('ADDRESS')) LOOP
    EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS';
  END LOOP;
END;
/
 
-- Create the address table.
CREATE TABLE address
( address_id                  NUMBER        CONSTRAINT pk_ad_1  PRIMARY KEY
, contact_id                  NUMBER        CONSTRAINT nn_ad_1  NOT NULL
, address_type                NUMBER        CONSTRAINT nn_ad_2  NOT NULL
, street_address              VARCHAR2(60)  CONSTRAINT nn_ad_3  NOT NULL
, city                        VARCHAR2(30)  CONSTRAINT nn_ad_4  NOT NULL
, state_province              VARCHAR2(30)  CONSTRAINT nn_ad_5  NOT NULL
, postal_code                 VARCHAR2(20)  CONSTRAINT nn_ad_6  NOT NULL);
 
-- Insert a row with a comma delimited street address.
INSERT INTO address VALUES
( 2001, 2001, 1008,'1111 Broadway, Suite 500, MS-5045','Oakland','CA','94604');
 
-- Query the data.
SELECT   street_address
,        city
,        state_province
,        postal_code
FROM     address;

UNF address table with atomic columns but a repeating group →

-- Conditionally drop the table.
BEGIN
  FOR i IN (SELECT   TABLE_NAME
            FROM     user_tables
            WHERE    TABLE_NAME IN ('ADDRESS')) LOOP
    EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS';
  END LOOP;
END;
/
 
-- Create the address table.
CREATE TABLE address
( address_id                  NUMBER
, address_line_id             NUMBER
, contact_id                  NUMBER        CONSTRAINT nn_ad_1  NOT NULL
, address_type                NUMBER        CONSTRAINT nn_ad_2  NOT NULL
, street_address              VARCHAR2(60)  CONSTRAINT nn_ad_3  NOT NULL
, city                        VARCHAR2(30)  CONSTRAINT nn_ad_4  NOT NULL
, state_province              VARCHAR2(30)  CONSTRAINT nn_ad_5  NOT NULL
, postal_code                 VARCHAR2(20)  CONSTRAINT nn_ad_6  NOT NULL
, CONSTRAINT pk_ad_1 PRIMARY KEY (address_id,address_line_id));
 
-- Insert data into table.
INSERT INTO address VALUES
( 2001, 1, 2001, 1008,'1111 Broadway','Oakland','CA','94604');
INSERT INTO address VALUES
( 2001, 2, 2001, 1008,'Suite 500','Oakland','CA','94604');
INSERT INTO address VALUES
( 2001, 3, 2001, 1008,'MS-5045','Oakland','CA','94604');
 
-- Query the table.
SELECT   contact_id
,        address_id
,        address_line_id
,        street_address
,        city
,        state_province
,        postal_code
FROM     address;

How to resolve partial dependencies to achieve 2NF →

-- Conditionally drop the table.
BEGIN
  FOR i IN (SELECT   TABLE_NAME
            FROM     user_tables
            WHERE    TABLE_NAME IN ('ADDRESS','STREET_ADDRESS')) LOOP
    EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS';
  END LOOP;
END;
/
 
-- Create address and street_address tables.
CREATE TABLE address
( address_id                  NUMBER        CONSTRAINT pk_ad1  PRIMARY KEY
, contact_id                  NUMBER        CONSTRAINT nn_ad1  NOT NULL
, address_type                NUMBER        CONSTRAINT nn_ad2  NOT NULL
, city                        VARCHAR2(30)  CONSTRAINT nn_ad4  NOT NULL
, state_province              VARCHAR2(30)  CONSTRAINT nn_ad5  NOT NULL
, postal_code                 VARCHAR2(20)  CONSTRAINT nn_ad6  NOT NULL);
 
CREATE TABLE street_address
( street_address_id           NUMBER        CONSTRAINT pk_sa1  PRIMARY KEY
, address_id                  NUMBER        CONSTRAINT nn_sa1  NOT NULL
, address_line_id             NUMBER        CONSTRAINT nn_sa2  NOT NULL
, street_address              VARCHAR2(60)  CONSTRAINT nn_sa3  NOT NULL
, CONSTRAINT fk_sa1 FOREIGN KEY(address_id) REFERENCES address(address_id));
 
-- Insert data.
INSERT INTO address VALUES
( 2001, 1, 1008,'Oakland','CA','94604');
INSERT INTO street_address VALUES
( 3001, 2001, 1,'1111 Broadway');
INSERT INTO street_address VALUES
( 3002, 2001, 2,'Suite 500');
INSERT INTO street_address VALUES
( 3003, 2001, 3,'MS-5045');
 
-- Query the data set to demonstrate repeating and non-repeating rows.
SELECT   a.contact_id
,        a.address_id
,        sa.address_line_id
,        sa.street_address
,        a.city
,        a.state_province
,        a.postal_code
FROM     address a INNER JOIN street_address sa
ON       a.address_id = sa.address_id;

How to eliminate repeating column groups →

-- Conditionally drop the table.
BEGIN
  FOR i IN (SELECT   TABLE_NAME
            FROM     user_tables
            WHERE    TABLE_NAME IN ('ADDRESS')) LOOP
    EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS';
  END LOOP;
END;
/
 
-- Create the address table.
CREATE TABLE address
( address_id                  NUMBER        CONSTRAINT pk_ad_1  PRIMARY KEY
, contact_id                  NUMBER        CONSTRAINT nn_ad_1  NOT NULL
, address_type                NUMBER        CONSTRAINT nn_ad_2  NOT NULL
, street_address1             VARCHAR2(20)  CONSTRAINT nn_ad_3  NOT NULL
, street_address2             VARCHAR2(20)  CONSTRAINT nn_ad_4  NOT NULL
, street_address3             VARCHAR2(20)  CONSTRAINT nn_ad_5  NOT NULL
, city                        VARCHAR2(30)  CONSTRAINT nn_ad_6  NOT NULL
, state_province              VARCHAR2(30)  CONSTRAINT nn_ad_7  NOT NULL
, postal_code                 VARCHAR2(20)  CONSTRAINT nn_ad_8  NOT NULL);
 
-- Insert a row with atomic column values.
INSERT INTO address VALUES
( 2001, 2001, 1008,'1111 Broadway','Suite 500','MS-5045','Oakland','CA','94604');
 
-- Query the data.
SELECT   street_address1
,        street_address2
,        street_address3
,        city
,        state_province
,        postal_code
FROM     address;

Written by michaelmclaughlin

August 11th, 2018 at 4:38 pm

Posted in

Database Tutorial

1NF Modeling

My Technical Blog

Tutorial Pages

Blogroll