1NF Modeling
Week 3: Reading
The following are the reading articles:
1st Normal Form (1NF)
Learning Outcomes
- Learn how to define and apply First Normal Form (1NF).
Lesson Materials
First Normal Form (1NF) exists when all columns, or attributes, of a table have a single data type and there are no repeating rows of data, which means rows are unique. This requires that:
- Column data types should be atomic, which means columns shouldn’t have repeating groups, comma delimited groups, or other subatomic parts. This raises a question whether compund variables like Oracle’s arrays, lists, and objects violate first normal form.
- Column names are unique in tables and arbitrarily ordered, which means their order doesn’t impose any constraint on the table.
- Rows in a table have no implicit or explicit ordering required for their access and use.
- Columns shouldn’t contain a repeating group in the context of a candidate key. This means that when you repeat all columns but one or a subset of columns, your model fails to meet the definition of first normal form.
Moreover, first normal form modeling finds the non-repeating rows and moves them from a base table to their own table. It removes all trace of the non-repeating rows from the base table. It also puts a foreign key in the new table that points back to the primary key of the base table. The foreign key in the new table is functionally dependent on the primary key in the base table.
What are atomic data types? →
The question of what constitutes an atomic value in first normal form is interesting because Oracle’s database engine runs as an Object Relational Database Management System (ORDBMS); also known as Extended Relational Database Management System. Are atomic column values limited to scalar variables? Scalar variables only hold one thing at a time, like an Integer.
If so, this would exclude Oracles’ specialized types. An alternative may ask that question differently, like “Are atomic column values scalar or compound variables?” Compound variables could be a comma separated list in a string data type, or either a SQL lists or Oracle’s User Defined Type (UDT).
The idea of comma separate lists can be dismissed as violating first normal form but the others aren’t so easy to toss out. While it might be heresy, I’d accept Oracle SQL lists or Oracle’s UDT (stored as object constructors with actual parameters) for two reasons. They are atomic in the scope of the database engine and most effectively transferred to active status inside a run-time view, which is then joined or acted upon by ordinary SQL commands. On the contrary point, lists are invariably ordered and that means they can’t by themselves be in first normal form. They become navigable ordered structures that you place into an aggregate table with a TABLE
function call. Does the structured ordering of inside an atomic type disqualify it from being in first normal form? I’d argue that the nature of the thing stored as no effect on how it is accessed by SQL, and therefore it meets first normal form regardless of how you navigate it once you query it.
Examples & Solutions: →
A table is in first normal form when all columns, or attributes, have a single data type and there are no repeating rows or repeating groups. The first two elements of this idea are generally understood and applied almost intuitively. The idea of a single data type is clear because most people understand any column can only have one data type, at any given time. The idea of uniqueness makes sense because you generally only want to act on one copy of anything at a time.
The third point is the one that is sometimes misunderstood. The idea of not being able to have repeating groups can confuse people because they often look at tables as a group of unique rows. Repeating groups indicate that you’ve got a multiple subject table design. The repeating rows indicate the base table is on the one side of a one-to-many relationship. The columns with non-repeating rows typically become a new table because they are on the many side of a one-to-many binary relationship.
Problem Description →
The example in this section uses a sample address
table to demonstrate how you move a table from UNF, or a HNF of nothing, to 1NF. This is the data definition of a version of the address
table. It has a design that lets you violate first normal form whenever the street_address
column requires more than a single entry or more than one row.
This type of table design lets you violate the atomic rule of first normal form because the street_address
column may need to accommodate one or more street address values. Typically, many developers see no harm in putting multiple address lines into a column as comma separated values. The data would look like the following:
Another type of table design lets you violate two rules of first normal form. It modifies the previous example by adding a second column to the primary key. This makes the primary key a compound or composite key. The following provides an example of such a table.
The second column allows for multiple rows to have an internal relationship. The second column in the primary key lets you order rows. The ordering of rows violates the first normal form rule that there will be no implicit or explicit ordering of rows. This type of design lets all columns except the ordering column of the compound primary key and street_address
column repeat. It violates the no repeating group rule of first normal form. The data set is also a mirror to what you see when joining two tables that have a one-to-many relationship. The columns on the one-side repeat while the columns on the many-side are unique.
In the preceding illustration the non-repeating columns and rows are highlighted. They illustrate that together they’re really a separate subject that should be removed to another table. The solution section covers approaches that resolve these design errors.
Problem Solution →
Solving first normal form problems can be a challenge. Sometimes it is more difficult than second or third normal form because at first normal form there are options. The conceptual solutions are: (1) remove the columns and put them in another table; (2) flatten the table, which is a form of denomalization; and (3) in those databases where they are supported implement a nested table or list.
Relational Solution →
Removing columns to another table is the best solution and practice because it offers you the most future flexibility. This is the frequently recommended alternative for online transaction processing (OLTP) systems.
If you query the data with an INNER JOIN
, the data will look very similar to the original problem data. This query demonstrates the problem.
SELECT a.contact_id , a.address_id , sa.address_line_id , sa.street_address , a.city , a.state_province , a.postal_code FROM address a INNER JOIN street_address sa ON a.address_id = sa.address_id; |
It generates the following as a result from the join. You should note that the one-side of the one-to-many binary relationship is repeating while the many side is unique. The process of repeating the result in the query return set is somewhat like but different than a Cartesian Product. It displays the projected columns of the join in each row, which means the one-side repeats in the return set. You need to perform a nested join to keep only a unique set of the one side in a procedural programming language. As a side note, it’s unfortunate that SQL doesn’t support a FLOWR statement like XQuery because it would eliminate the need of a procedure programming language.
Flattening (Data Warehousing) Solution →
Flattening a non-repeating column into a series of columns requires creating a column for every unique row that may occur. The following is a definition of such a table:
The data in a flattened model appear like this:
Object Relational Solution →
Implement a non-repeating column as a list. You first create a scalar collection type, and then implement the collection in an address table. You create a SQL scalar collection type of variable length strings with the following syntax:
CREATE TYPE street_list IS TABLE OF VARCHAR2(30); / |
After you create the type, you can implement it as a column data type. This particular SQL data type is not an array but a list. A list requires a specialized clause for storage of the nested table, as shown in the following create table statement.
CREATE TABLE address_book ( address_book_id NUMBER , street_address STREET_LIST , city VARCHAR2(30) , state VARCHAR2(2) , postal_code VARCHAR2(10)) NESTED TABLE street_address STORE AS street_table; |
The table description does not qualify the specialized storage instructions. You have to inspect the data catalog information to see how the database manages these.
You can insert rows of data but you must use constructors to insert into a nested table structure. The following enters two rows of data:
INSERT INTO address VALUES ( 1, street_list('203 North La Salle','Suite 1400'), 'Chicago', 'IL', '60601'); INSERT INTO address VALUES ( 2, street_list('203 North La Salle','Suite 1800'), 'Chicago', 'IL', '60601'); |
A query of the data in the table yields:
You can find more about joining and querying nested tables in this prior blog post.
Setup Scripts →
The scripts for first normal form are more numerous than others. You’ll find them preceded by brief descriptions, and should consider working through them sequentially.
UNF address
table with a non-atomic street_address
column →
-- Conditionally drop the table. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('ADDRESS')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; END; / -- Create the address table. CREATE TABLE address ( address_id NUMBER CONSTRAINT pk_ad_1 PRIMARY KEY , contact_id NUMBER CONSTRAINT nn_ad_1 NOT NULL , address_type NUMBER CONSTRAINT nn_ad_2 NOT NULL , street_address VARCHAR2(60) CONSTRAINT nn_ad_3 NOT NULL , city VARCHAR2(30) CONSTRAINT nn_ad_4 NOT NULL , state_province VARCHAR2(30) CONSTRAINT nn_ad_5 NOT NULL , postal_code VARCHAR2(20) CONSTRAINT nn_ad_6 NOT NULL); -- Insert a row with a comma delimited street address. INSERT INTO address VALUES ( 2001, 2001, 1008,'1111 Broadway, Suite 500, MS-5045','Oakland','CA','94604'); -- Query the data. SELECT street_address , city , state_province , postal_code FROM address; |
UNF address
table with atomic columns but a repeating group →
-- Conditionally drop the table. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('ADDRESS')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; END; / -- Create the address table. CREATE TABLE address ( address_id NUMBER , address_line_id NUMBER , contact_id NUMBER CONSTRAINT nn_ad_1 NOT NULL , address_type NUMBER CONSTRAINT nn_ad_2 NOT NULL , street_address VARCHAR2(60) CONSTRAINT nn_ad_3 NOT NULL , city VARCHAR2(30) CONSTRAINT nn_ad_4 NOT NULL , state_province VARCHAR2(30) CONSTRAINT nn_ad_5 NOT NULL , postal_code VARCHAR2(20) CONSTRAINT nn_ad_6 NOT NULL , CONSTRAINT pk_ad_1 PRIMARY KEY (address_id,address_line_id)); -- Insert data into table. INSERT INTO address VALUES ( 2001, 1, 2001, 1008,'1111 Broadway','Oakland','CA','94604'); INSERT INTO address VALUES ( 2001, 2, 2001, 1008,'Suite 500','Oakland','CA','94604'); INSERT INTO address VALUES ( 2001, 3, 2001, 1008,'MS-5045','Oakland','CA','94604'); -- Query the table. SELECT contact_id , address_id , address_line_id , street_address , city , state_province , postal_code FROM address; |
How to resolve partial dependencies to achieve 2NF →
-- Conditionally drop the table. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('ADDRESS','STREET_ADDRESS')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; END; / -- Create address and street_address tables. CREATE TABLE address ( address_id NUMBER CONSTRAINT pk_ad1 PRIMARY KEY , contact_id NUMBER CONSTRAINT nn_ad1 NOT NULL , address_type NUMBER CONSTRAINT nn_ad2 NOT NULL , city VARCHAR2(30) CONSTRAINT nn_ad4 NOT NULL , state_province VARCHAR2(30) CONSTRAINT nn_ad5 NOT NULL , postal_code VARCHAR2(20) CONSTRAINT nn_ad6 NOT NULL); CREATE TABLE street_address ( street_address_id NUMBER CONSTRAINT pk_sa1 PRIMARY KEY , address_id NUMBER CONSTRAINT nn_sa1 NOT NULL , address_line_id NUMBER CONSTRAINT nn_sa2 NOT NULL , street_address VARCHAR2(60) CONSTRAINT nn_sa3 NOT NULL , CONSTRAINT fk_sa1 FOREIGN KEY(address_id) REFERENCES address(address_id)); -- Insert data. INSERT INTO address VALUES ( 2001, 1, 1008,'Oakland','CA','94604'); INSERT INTO street_address VALUES ( 3001, 2001, 1,'1111 Broadway'); INSERT INTO street_address VALUES ( 3002, 2001, 2,'Suite 500'); INSERT INTO street_address VALUES ( 3003, 2001, 3,'MS-5045'); -- Query the data set to demonstrate repeating and non-repeating rows. SELECT a.contact_id , a.address_id , sa.address_line_id , sa.street_address , a.city , a.state_province , a.postal_code FROM address a INNER JOIN street_address sa ON a.address_id = sa.address_id; |
How to eliminate repeating column groups →
-- Conditionally drop the table. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('ADDRESS')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; END; / -- Create the address table. CREATE TABLE address ( address_id NUMBER CONSTRAINT pk_ad_1 PRIMARY KEY , contact_id NUMBER CONSTRAINT nn_ad_1 NOT NULL , address_type NUMBER CONSTRAINT nn_ad_2 NOT NULL , street_address1 VARCHAR2(20) CONSTRAINT nn_ad_3 NOT NULL , street_address2 VARCHAR2(20) CONSTRAINT nn_ad_4 NOT NULL , street_address3 VARCHAR2(20) CONSTRAINT nn_ad_5 NOT NULL , city VARCHAR2(30) CONSTRAINT nn_ad_6 NOT NULL , state_province VARCHAR2(30) CONSTRAINT nn_ad_7 NOT NULL , postal_code VARCHAR2(20) CONSTRAINT nn_ad_8 NOT NULL); -- Insert a row with atomic column values. INSERT INTO address VALUES ( 2001, 2001, 1008,'1111 Broadway','Suite 500','MS-5045','Oakland','CA','94604'); -- Query the data. SELECT street_address1 , street_address2 , street_address3 , city , state_province , postal_code FROM address; |