## Lesson #3: Modeling Data

This discusses how you model business problems into Entity Relationship Diagrams (ERDs), which are visual representations of data. There are five principal techniques for data modeling. Each involves a means of visualizing data. There are five major evolutions in visualizing these models: Original Entity Relation, Extended Entity Relation, Information Engineering (also related to Barker’s notation employed in the Oracle*Method), IDEF1X (US Department of Defense (DoD) process), and UML. IDEF1X is no longer the subject of most DoD contracts and therefore excluded from coverage.

These modeling techniques represent binary and n-ary relationships. Binary relationships are the glue between two things (typically tables or views), and as such, they are external relationships. N-ary relationships are the glue between three or more things. You have three types of binary relationships: (a) one-to-one; (b) one-to-many; and (c) many-to-many. There are many more types of n-ary relationships because you simply have various sets made up of those three binary relationships in a network relationship.

Binary and n-ary relationships define forms of cardinality between rows in one table and rows in another table. Every relationship is composed of two relations. A relation points from one table to another (this may be called the perspective of the relation), and qualifies how many rows in the other table may be related. The minimum cardinality is the number or rows required in a relation, and was once called the ordinality of a relation. The maximum cardinality is number of rows at most supported in a relation. The combination of the minimum and maximum cardinality is simply the cardinality or the possible range of rows in an ** external relation**. This is why we use minimum and maximum cardinality to qualify endpoints of a range.

A binary relationship also means that one or a set of columns in a row of one table contain values that map to an equal number of columns in one or more rows in another table. One table considers the column or set of columns as a primary key, and the other considers their equivalent a foreign key. A key is typically a single column, while a composite key is a set of columns (also called a compound key). When the primary key is a unique identifier generated by a sequence, the primary key only identifies the uniqueness of a row, and is known as a surrogate key. Surrogate keys provide aliases for the column or set of columns that represents a natural key.

A natural key is a column or set of columns that uniquely identifies any row of data in a table. Both the surrogate and natural key are candidate keys. They are candidates because you’ll choose between one or more candidate keys to become the primary key and represent the data to other tables in the data model. The primary key is generally a published identifier for a row in an external relation. A primary key based on a surrogate key separates the external unique identifier from the descriptive data column(s). It allows you to recognize changes in your understanding of the table’s single subject (or domain) without requiring you to change SQL join statements. The combination of the surrogate and natural key are often used to define a unique index to speed data access.

A one-to-one relationship means that a row holding the primary key maps to at most a single row holding a copy of the primary key, known as the foreign key. A one-to-many relationship means that a row holding the primary key maps to zero to many rows holding a copy of the primary key. A many-to-many relationship is a logical relationship and has no physical implementation. Two tables related by a many-to-many relationship require an association table to implement their relationship.

The two tables logically related by a many-to-many binary relationship each hold a primary key. The association table inherits a copy of each table’s primary key, which is stored in a single row. This lets the association table map many rows in one table to many rows in another. As mentioned earlier, n-ary relationships simply increase the number of foreign keys held in direct ratio to the number of related tables. This means a tenary (specialized form of an n-ary) relationship maps three tables together through a single row, and contains a foreign key for each of the three tables.

The modeling approaches can best be represented on the four major step stones. Unfortunately, this excludes a number of approaches but you’ll find references to those in the external links. After covering the visual semantic models, we need to cover the concept of normalization. The summaries and links follow in the list:

- Original Entity-Relation Modeling
- The first modeling technique adopted widely was the Chen model, known as
*Original Entity-Relation*model. It uses diamonds to represent relationships, and squares to represent entities (or simple tables). The problem with the Original E-R Model was that it didn’t qualify minimum cardinality between entities. It only qualified maximum cardinality. - Extended Entity-Relation Modeling
- The second modeling technique adopted widely was the Extended ER, it let you qualify the minimum cardinality. It’s largest problem was that the diamonds containing the relationship didn’t map well to CASE (Computer-Aided Software Engineering) design tools and the 8.5″ by 11″ sheet of paper.
- Information Engineering
- The third modeling technique adopted widely and produced by TI Corporation first for the IBM Mainframe is perhaps most consistently in use today. It takes several slightly different forms but falls into the family of approaches known as the
*Information Engineering*model. It allows for qualification of minimum and maximum cardinality by doing away with the diamonds and introducing a crows-foot notation. - Uniform Modeling Language (UML)
- The last modeling technique adopted is UML. UML is not perfectly suited to diagramming data models because it doesn’t support many conventions from prior modeling techniques. It is simpler when well understood but unfortunately the Object-Oriented Analysis and Design model interferes with it’s adoption in the database community to some extent.
- Normalization
- Normalization is the process of grouping data into single subjects to avoid
*insertion*,*update*, and*deletion*anomalies.