D – Modeling in UML
Unified Modeling Language (UML) is typically applied to programming problems but you can also apply it to creating Entity Relation Models (ERMs). This page shows you how to apply UML to the domain of modeling databases.
UML uses class diagrams to model object types. Entities in traditional ERMs or ERDs (Entity Relation Diagrams) are objects. More often than not tables are called structures in a Relational Database Management System (RDBMS) because they don’t have methods. Object-Relational Database Management Systems (ORDBMS) like Oracle support tables as a specialization of a root-node object. User Defined Types (UDTs) are created as specializations of the same root-node object. You may exclude or include methods when you define UDTs.
UML supports both RDBMS and ORDBMS database modeling, which is advantageous. As shown in the drawing, a class is composed of three rectangles. The topmost one is for the name of the object (or table), the middle one is for the attributes of the object (or the columns of the table), and the bottom one is for methods of the object.
An unadorned line that connects two objects is a shorthand version of the real symbol, which is a composition or aggregation symbol. Aggregation is the principle that a class holds an instance of another class that is independent of it. Composition is the principle that a class holds an instance of another class that is dependent upon itself, which typically means an inner class. Any variable you define in a class of your own has a data type that is an object type. The modeling relationship to an inner class object type is one of composition, whereas, the modeling relationship to an external and independent class object type is one of aggregation. These are both modeling representations of dependency. When one class refers to another for it’s very existence, then that class is dependent upon the other class.
The open diamond represents an external relationship and is positioned next to the class that has the dependency on the other. The filled-in diamond represents an internal relationship and is positioned next to the class that holds a dependency on a subcomponent of itself. The arrow points to the class that is the dependency. Simplification of drawings often leads to an unadorned line replacing an adorned aggregation line.
These relationships are composed of two relations. One relation from the holder of the reference to the holder of the referred key. Another relation from the referred key to the holder of a copy of the key.
Likewise, any row that holds a foreign key to another table’s primary key references that table. This makes the table holding the foreign key functionally dependent on the table holding the primary key. It also makes the table holding the foreign key the parent of the relationship between the two tables.
This may strike you as odd because the row that holds the primary key must typically exist before the row that holds the foreign key. The word must applies when a FOREIGN KEY
constraint imposes the dependency. Without a FOREIGN KEY
constraint, the Application Programming Interface often enforces the business logic.
More or less, the foreign key points to another row that holds the primary key because it is functionally dependent on it. This makes the tables related, which means together they have a larger subject context then they do by themselves. The separation of these related data occurs because normalization attempts to eliminate redundancy and narrow the scope of a table to a single subject. Normalization tries to prevent storing duplicate data because over time duplicate data may devolve – a fancy word that means they become different values and therefore inconsistent.
Click the following illustration to see a Flash animation of assembling data through joins based on the primary and foreign key values.
A real world object lesson of this principle is an apple tree. The tree would be modeled into a table TREE
. The TREE
table would probably evolve to become a table that describe any tree, which would make it capable of supporting a generic tree. The TREE
table would most likely contain a foreign key to another table that describes the fruit. The FRUIT
table would then hold attributes that define various fruits and their varieties, like a Gala apple.
The TREE
table by itself has no more than decorative purpose. A real world context requires the FRUIT
table. Therefore, the perspective on the data starts with the TREE
table but incorporates the data from the FRUIT
table through it’s inherited foreign key.
Moving from the discussion of the relationship of the tables and how their data is reassembled into a meaningful whole. You need to model the solution. Modeling in UML starts with the diagramming component that connects tables. It’s the aggregation or composition relationship line discussed above. Based on the foregoing, the diamond goes next to the table that holds the foreign key, and the arrow goes next to the table that holds the primary key.
Whether the diamond is filled or empty depends on the minimum cardinality of the foreign key. A row that holds a database-level constrained foreign key can’t exist before you insert the row that holds the primary key value. A primary key column is thus constrained in the relationship and has a minimum cardinality of one and a maximum cardinality of one (based on the principle that primary key values are unique). A row that holds the foreign key can exist before you insert the row that holds the primary key when there isn’t a foreign key constraint. In this case the primary key is unconstrained in the relationship and the minimum cardinality is zero and maximum cardinality is one.
The terms aggregation and composition actually map to non-ID-dependent and ID-dependent relationships in traditional data modeling. A non-ID-dependent relationship is the preferred solution with an RDBMS. ID-dependent solutions wok in an ORDBMS as nested tables.
As a rule, non-ID-dependent relationships are more common, and they follow the model that all tables are independent. Independent tables have their own primary key and are called strong entities in traditional data modeling.
Implementing ID-dependent relationships are the exception in data modeling. You should note while ID-dependent relationships are the exception they’re not uncommon. An ID-dependent relationship make the table that holds the foreign key a weak entity, because there isn’t a primary key for the table. Identifications of uniqueness are maintained within the constraint of the primary key. This is why they’re implemented as a nested table. The nested table is automatically part of the row where it is defined. Therefore, the same primary key that identifies the row identifies the table within it. You can find an example of a weak entity in resolutions to achieve 1st Normal Form the Normalization page.
It is a recommended that you write a specification before writing UML diagrams. In the case of class diagrams, a specification like the following may be most useful when using class diagrams for ERMs.
Table Name: KINGDOM | |||||
---|---|---|---|---|---|
Column Name | Constraint | Data Type |
Physical Size |
||
Type | Reference Table | Reference Column | |||
KINGDOM_ID | PRIMARY KEY | Integer | Maximum | ||
KINGDOM_NAME | NOT NULL | String | 80 | ||
POPULATION | NOT NULL | Integer | Maximum |
Table Name: KNIGHT | |||||
---|---|---|---|---|---|
Column Name | Constraint | Data Type |
Physical Size |
||
Type | Reference Table | Reference Column | |||
KNIGHT_ID | PRIMARY KEY | Integer | Maximum | ||
KINGDOM_ALLEGIANCE_ID | KINGDOM | KINGDOM_ID | Integer | Maximum | |
FULL_NAME | NOT NULL | String | 40 |
The UML specification tells us that both tables have independent primary keys. This means the tables are non-ID-dependent and share an aggregation relationship. This means we can dispense with an adorned aggregation line and use a simple line to connect the two tables.
The UML specification also tells us that the foreign key column has a different name than the primary key column that it references. There is no FOREIGN KEY
constraint in the specification. This means the minimum cardinality of the relationship to the primary key could be zero. That would also make the cardinality of the relation 0..1
. The application you’re designing requires that a knight record must have an allegiance to insert a row in the KNIGHT
table. That requirement would require a textual note or we could change the specification as from an unconstrained column to a constrained column, like:
Table Name: KNIGHT | |||||
---|---|---|---|---|---|
Column Name | Constraint | Data Type |
Physical Size |
||
Type | Reference Table | Reference Column | |||
KNIGHT_ID | PRIMARY KEY | Integer | Maximum | ||
KINGDOM_ALLEGIANCE_ID | NOT NULL | KINGDOM | KINGDOM_ID | Integer | Maximum |
FULL_NAME | NOT NULL | String | 40 |
The addition of the NOT NULL
constraint now indicates that the minimum cardinality of the relation should be one, at least from the business logic perspective. The database when not constrained by a foreign key is always a 0..1
relation cardinality. The NOT NULL
constraint makes the business logic cardinality of the relation 1..1
, or in shorthand notation 1
. The specification is rendered in to a UML class diagram of the business logic as follows:
How to read relations
The relation from KINGDOM
to KNIGHT
is written as a formal English sentence like this:
An instance of a KINGDOM
may have one to many instances of KNIGHT
.
The formality in the sentence is linked to two things. First, the use of instance that signifies a single row or set of rows. Second, the “may have one to many” phrase means it has zero to many related rows. This means the minimum cardinality is zero and the maximum cardinality is many, which is written symbolically in long hand as 0..*
or in short hand as *
.
Since you’re telling a design story with UML, try to keep the sentences as short as possible without any loss of precision. You should also use the annotations, which adorn the relationships. Making these changes, the sentence preferred sentence would be:
A KINGDOM
may rule over many KNIGHT
s.
The opposite direction also has a relation. The formal way to write it would be:
An instance of KNIGHT
must have one and only one instance of KINGDOM
.
The more direct and simpler statement that leverages the adornment is:
A KNIGHT
has sworn allegiance to one KINGDOM
.
The rules exhibited should apply to all class diagrams. The hardest task for many is deriving the cardinalities of relations.
How to map cardinalities
You can see the cardinality map in the following example that links the primary key column with the foreign key column in the other table. In each row of the KNIGHT
table the foreign key maps to one and only one row in the KINGDOM
table. Likewise, the rows in the KNIGHT
table derive their KINGDOM_ALLEGIANCE_ID
column values from the list of possible values found in the KINGDOM_ID
column of the KINGDOM
table. The values in the KINGDOM_ALLEGIANCE_ID
column are copies of the possible values found in the PRIMARY KEY
column.
If you were to insert a new row in the KNIGHT
table with a KINGDOM_ALLEGIANCE_ID
value not found in the KNIGHT
table, any join would exclude the new row. The new row is effectively an orphan because there isn’t a row in the KINGDOM
table with a PRIMARY KEY
value equal to the new FOREIGN KEY
value. An orphan row is excluded from extending it’s context because it holds a FOREIGN KEY
that points to nothing. The possibility that you can INSERT
a new knight without a valid kingdom allegiance makes the minimum cardinality of the relation zero, while the maximum cardinality is one.
Since a KINGDOM may exist without any knights pledged in allegiance, the cardinality from KINGDON
to KNIGHT
would be zero to many. The many rows in the KNIGHT
table that work in a relationship are those with a FOREIGN KEY
value found in the list of values in PRIMARY KEY
column. There is no constraint on the number of rows in the KNIGHT
table that can point to any row in the KINGDOM
table in this design. The only way to enforce a minimum cardinality of one in the relation to the KNIGHT
table is to provide a reciprocal constraint, which is typically implemented as a database trigger.
Setup Script ↓
The drop down items, unfold with source code to seed the examples.
Oracle SQL Script ↓
This has the setup script for the example tables.
-- Conditionally drop tables and sequences. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('KINGDOM','KNIGHT')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT sequence_name FROM user_sequences WHERE sequence_name IN ('KINGDOM_S1','KNIGHT_S1')) LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name; END LOOP; END; / -- Create tables and sequences. CREATE TABLE KINGDOM ( kingdom_id NUMBER CONSTRAINT pk_kingdom PRIMARY KEY , kingdom_name VARCHAR2(80) CONSTRAINT nn1_kingdom NOT NULL , population NUMBER CONSTRAINT nn2_kingdom NOT NULL); CREATE SEQUENCE kingdom_s1; CREATE TABLE KNIGHT ( knight_id NUMBER CONSTRAINT pk_knight PRIMARY KEY , kingdom_allegiance_id NUMBER CONSTRAINT nn_knight NOT NULL , full_name VARCHAR2(40) CONSTRAINT nn2_knight NOT NULL); CREATE SEQUENCE knight_s1; -- Seed values in the tables. INSERT INTO kingdom VALUES ( kingdom_s1.nextval ,'Camelot' , 767500 ); INSERT INTO kingdom VALUES ( kingdom_s1.nextval ,'Narnia' , 444321 ); INSERT INTO knight VALUES ( knight_s1.nextval ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Camelot') ,'Lionel'); INSERT INTO knight VALUES ( knight_s1.nextval ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Camelot') ,'Bors'); INSERT INTO knight VALUES ( knight_s1.nextval ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Narnia') ,'Peter the Magnificient'); INSERT INTO knight VALUES ( knight_s1.nextval ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Narnia') ,'Edmund the Just'); INSERT INTO knight VALUES ( knight_s1.nextval ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Camelot') ,'Lancelot'); |
MySQL SQL Script ↓
This has the setup script for the example tables.
-- Conditionally drop tables and sequences. DROP TABLE IF EXISTS kingdom; DROP TABLE IF EXISTS knight; -- Create tables and sequences. CREATE TABLE KINGDOM ( kingdom_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , kingdom_name VARCHAR(80) NOT NULL , population INT NOT NULL); CREATE TABLE KNIGHT ( knight_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , kingdom_allegiance_id INT NOT NULL , full_name VARCHAR(40) NOT NULL); -- Seed values in the tables. INSERT INTO kingdom VALUES ( NULL ,'Camelot' , 767500 ); INSERT INTO kingdom VALUES ( NULL ,'Narnia' , 444321 ); INSERT INTO knight VALUES ( NULL ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Camelot') ,'Lionel'); INSERT INTO knight VALUES ( NULL ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Camelot') ,'Bors'); INSERT INTO knight VALUES ( NULL ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Narnia') ,'Peter the Magnificient'); INSERT INTO knight VALUES ( NULL ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Narnia') ,'Edmund the Just'); INSERT INTO knight VALUES ( NULL ,(SELECT kingdom_id FROM kingdom WHERE kingdom_name = 'Camelot') ,'Lancelot'); |