Database Tutorial

Course Tutorial Site

Site Admin

D – Modeling in UML

without comments

Unified Modeling Language (UML) is typically applied to programming problems but you can also apply it to creating Entity Relation Models (ERMs). This page shows you how to apply UML to the domain of modeling databases.

UML uses class diagrams to model object types. Entities in traditional ERMs or ERDs (Entity Relation Diagrams) are objects. More often than not tables are called structures in a Relational Database Management System (RDBMS) because they don’t have methods. Object-Relational Database Management Systems (ORDBMS) like Oracle support tables as a specialization of a root-node object. User Defined Types (UDTs) are created as specializations of the same root-node object. You may exclude or include methods when you define UDTs.

UML supports both RDBMS and ORDBMS database modeling, which is advantageous. As shown in the drawing, a class is composed of three rectangles. The topmost one is for the name of the object (or table), the middle one is for the attributes of the object (or the columns of the table), and the bottom one is for methods of the object.

An unadorned line that connects two objects is a shorthand version of the real symbol, which is a composition or aggregation symbol. Aggregation is the principle that a class holds an instance of another class that is independent of it. Composition is the principle that a class holds an instance of another class that is dependent upon itself, which typically means an inner class. Any variable you define in a class of your own has a data type that is an object type. The modeling relationship to an inner class object type is one of composition, whereas, the modeling relationship to an external and independent class object type is one of aggregation. These are both modeling representations of dependency. When one class refers to another for it’s very existence, then that class is dependent upon the other class.

The open diamond represents an external relationship and is positioned next to the class that has the dependency on the other. The filled-in diamond represents an internal relationship and is positioned next to the class that holds a dependency on a subcomponent of itself. The arrow points to the class that is the dependency. Simplification of drawings often leads to an unadorned line replacing an adorned aggregation line.

These relationships are composed of two relations. One relation from the holder of the reference to the holder of the referred key. Another relation from the referred key to the holder of a copy of the key.

Likewise, any row that holds a foreign key to another table’s primary key references that table. This makes the table holding the foreign key functionally dependent on the table holding the primary key. It also makes the table holding the foreign key the parent of the relationship between the two tables.

This may strike you as odd because the row that holds the primary key must typically exist before the row that holds the foreign key. The word must applies when a FOREIGN KEY constraint imposes the dependency. Without a FOREIGN KEY constraint, the Application Programming Interface often enforces the business logic.

More or less, the foreign key points to another row that holds the primary key because it is functionally dependent on it. This makes the tables related, which means together they have a larger subject context then they do by themselves. The separation of these related data occurs because normalization attempts to eliminate redundancy and narrow the scope of a table to a single subject. Normalization tries to prevent storing duplicate data because over time duplicate data may devolve – a fancy word that means they become different values and therefore inconsistent.

Click the following illustration to see a Flash animation of assembling data through joins based on the primary and foreign key values.

A real world object lesson of this principle is an apple tree. The tree would be modeled into a table TREE. The TREE table would probably evolve to become a table that describe any tree, which would make it capable of supporting a generic tree. The TREE table would most likely contain a foreign key to another table that describes the fruit. The FRUIT table would then hold attributes that define various fruits and their varieties, like a Gala apple.

The TREE table by itself has no more than decorative purpose. A real world context requires the FRUIT table. Therefore, the perspective on the data starts with the TREE table but incorporates the data from the FRUIT table through it’s inherited foreign key.

Moving from the discussion of the relationship of the tables and how their data is reassembled into a meaningful whole. You need to model the solution. Modeling in UML starts with the diagramming component that connects tables. It’s the aggregation or composition relationship line discussed above. Based on the foregoing, the diamond goes next to the table that holds the foreign key, and the arrow goes next to the table that holds the primary key.

Whether the diamond is filled or empty depends on the minimum cardinality of the foreign key. A row that holds a database-level constrained foreign key can’t exist before you insert the row that holds the primary key value. A primary key column is thus constrained in the relationship and has a minimum cardinality of one and a maximum cardinality of one (based on the principle that primary key values are unique). A row that holds the foreign key can exist before you insert the row that holds the primary key when there isn’t a foreign key constraint. In this case the primary key is unconstrained in the relationship and the minimum cardinality is zero and maximum cardinality is one.

The terms aggregation and composition actually map to non-ID-dependent and ID-dependent relationships in traditional data modeling. A non-ID-dependent relationship is the preferred solution with an RDBMS. ID-dependent solutions wok in an ORDBMS as nested tables.

As a rule, non-ID-dependent relationships are more common, and they follow the model that all tables are independent. Independent tables have their own primary key and are called strong entities in traditional data modeling.

Implementing ID-dependent relationships are the exception in data modeling. You should note while ID-dependent relationships are the exception they’re not uncommon. An ID-dependent relationship make the table that holds the foreign key a weak entity, because there isn’t a primary key for the table. Identifications of uniqueness are maintained within the constraint of the primary key. This is why they’re implemented as a nested table. The nested table is automatically part of the row where it is defined. Therefore, the same primary key that identifies the row identifies the table within it. You can find an example of a weak entity in resolutions to achieve 1st Normal Form the Normalization page.

It is a recommended that you write a specification before writing UML diagrams. In the case of class diagrams, a specification like the following may be most useful when using class diagrams for ERMs.

Table Name: KINGDOM
Column Name Constraint Data
Type
Physical
Size
Type Reference Table Reference Column
KINGDOM_ID PRIMARY KEY Integer Maximum
KINGDOM_NAME NOT NULL String 80
POPULATION NOT NULL Integer Maximum

Table Name: KNIGHT
Column Name Constraint Data
Type
Physical
Size
Type Reference Table Reference Column
KNIGHT_ID PRIMARY KEY Integer Maximum
KINGDOM_ALLEGIANCE_ID KINGDOM KINGDOM_ID Integer Maximum
FULL_NAME NOT NULL String 40

The UML specification tells us that both tables have independent primary keys. This means the tables are non-ID-dependent and share an aggregation relationship. This means we can dispense with an adorned aggregation line and use a simple line to connect the two tables.

The UML specification also tells us that the foreign key column has a different name than the primary key column that it references. There is no FOREIGN KEY constraint in the specification. This means the minimum cardinality of the relationship to the primary key could be zero. That would also make the cardinality of the relation 0..1. The application you’re designing requires that a knight record must have an allegiance to insert a row in the KNIGHT table. That requirement would require a textual note or we could change the specification as from an unconstrained column to a constrained column, like:

Table Name: KNIGHT
Column Name Constraint Data
Type
Physical
Size
Type Reference Table Reference Column
KNIGHT_ID PRIMARY KEY Integer Maximum
KINGDOM_ALLEGIANCE_ID NOT NULL KINGDOM KINGDOM_ID Integer Maximum
FULL_NAME NOT NULL String 40

The addition of the NOT NULL constraint now indicates that the minimum cardinality of the relation should be one, at least from the business logic perspective. The database when not constrained by a foreign key is always a 0..1 relation cardinality. The NOT NULL constraint makes the business logic cardinality of the relation 1..1, or in shorthand notation 1. The specification is rendered in to a UML class diagram of the business logic as follows:

How to read relations

The relation from KINGDOM to KNIGHT is written as a formal English sentence like this:

An instance of a KINGDOM may have one to many instances of KNIGHT.

The formality in the sentence is linked to two things. First, the use of instance that signifies a single row or set of rows. Second, the “may have one to many” phrase means it has zero to many related rows. This means the minimum cardinality is zero and the maximum cardinality is many, which is written symbolically in long hand as 0..* or in short hand as *.

Since you’re telling a design story with UML, try to keep the sentences as short as possible without any loss of precision. You should also use the annotations, which adorn the relationships. Making these changes, the sentence preferred sentence would be:

A KINGDOM may rule over many KNIGHTs.

The opposite direction also has a relation. The formal way to write it would be:

An instance of KNIGHT must have one and only one instance of KINGDOM.

The more direct and simpler statement that leverages the adornment is:

A KNIGHT has sworn allegiance to one KINGDOM.

The rules exhibited should apply to all class diagrams. The hardest task for many is deriving the cardinalities of relations.

How to map cardinalities

You can see the cardinality map in the following example that links the primary key column with the foreign key column in the other table. In each row of the KNIGHT table the foreign key maps to one and only one row in the KINGDOM table. Likewise, the rows in the KNIGHT table derive their KINGDOM_ALLEGIANCE_ID column values from the list of possible values found in the KINGDOM_ID column of the KINGDOM table. The values in the KINGDOM_ALLEGIANCE_ID column are copies of the possible values found in the PRIMARY KEY column.

If you were to insert a new row in the KNIGHT table with a KINGDOM_ALLEGIANCE_ID value not found in the KNIGHT table, any join would exclude the new row. The new row is effectively an orphan because there isn’t a row in the KINGDOM table with a PRIMARY KEY value equal to the new FOREIGN KEY value. An orphan row is excluded from extending it’s context because it holds a FOREIGN KEY that points to nothing. The possibility that you can INSERT a new knight without a valid kingdom allegiance makes the minimum cardinality of the relation zero, while the maximum cardinality is one.

Since a KINGDOM may exist without any knights pledged in allegiance, the cardinality from KINGDON to KNIGHT would be zero to many. The many rows in the KNIGHT table that work in a relationship are those with a FOREIGN KEY value found in the list of values in PRIMARY KEY column. There is no constraint on the number of rows in the KNIGHT table that can point to any row in the KINGDOM table in this design. The only way to enforce a minimum cardinality of one in the relation to the KNIGHT table is to provide a reciprocal constraint, which is typically implemented as a database trigger.

Written by michaelmclaughlin

September 27th, 2009 at 11:50 pm

Posted in

Leave a Reply