Introduction to Databases
The world of databases is very competitive and evolutionary. Current trends point to XML databases like Mark Logic leading us into a new way of discovery in data warehouses, which is the world of Online Analytical Processing (OLAP). Oracle continues to dominate in Online Transactional Processing (OLTP) databases, partly because they have a tremendous process improvement cycle matched to intelligent product acquisitions.
Regardless of implementation, OLTP and OLAP are the two types of databases that people use. Sometimes an implementation isn’t large enough to separate a company’s OLTP from the OLAP, and in those cases they implement a dual purpose database. This is generally an unfortunate decision because the response time and resource requirements of these two types of databases are quite juxtaposed.
Databases were once no more than a collection of files. Each file contained something unique, like a single subject (but not always). The expense of managing the disparate file structures led to the development of a Hierarchical Linear Model of storing information, often called a hierarchical database (very much like an XSchema in XML). The collection of programs to run a Hierarchical database became known as a Database Management System (DBMS). The problem with hierarchical databases is that the organization of information never quite represented the different ways that were required by consumers of the data. These consumers are known as knowledge workers thanks to the seminal work of Alvin Toffler – PowerShift: Knowledge, Wealth, and Violence at the Edge of the 21st Century. It covers his Third Wave of his view of the future. You may be more familiar with its repackaged and economic focus in The World is Flat by Thomas Friedman.
The Networked Model replaced the hierarchical model by introducing pointers. Pointers allow an optimized join between a node in one part of the hierarchy with another. These pointers redirected the search tree when matching data wasn’t found in sibling (side-by-side) nodes of a hierarchy. The Relational Model abstracted these pointers into indexes (a list of available nodes) and put them in the data catalog as a list of tables in a Relational Database Management System. This list of tables allows SQL to create relationship through matching values (equijoin) from one column with values in another column. It also can compare a value of one column against a pair of range values (non-equijoin) by using the BETWEEN operator. A non-equijoin also applies to comparisons with inequality operators.
The object model was proposed as a replacement to the relational model because it would group data into related and meaningful human objects. The problem with the object model was the database had to store the data between actions in a serialized (or collapsed) format. These databases were and are highly memory intensive and therefore don’t scale as well as the relational model. Oracle had committed to the object model but back peddled when they found it impractical. They developed an extended relational model, which quickly was renamed to the object relational model. This model only marshals (places into memory) those objects required at run time, and is the cornerstone of the Object Relational Database Management System (ORDBMS).
The basic architecture of these products is simple, while the implementation details can take months and years to master as a Database Administrator (DBA). They work on the concept of a server engine and client interface. The engine is known as a Database Management System (DBMS), and it generally has subsystems that run Structured Query Language (SQL) and a database catalog.
SQL is actually a shortened form of IBM’s SEQUEL (Structured English Query Language), and is generally accepted as meaning Structured Query Language. Alan Beaulieu makes the point that SQL really doesn’t “stand for anything” (Learning SQL), and while he may be technically right more people now understand it as such. Therefore, I’d accept its meaning as is because it simply makes sense to do so.
The interface is a typical client-side interactive and batch programming interface, and it lets you access the engine. Both the client-side interface and DBMS engine are deployed on each server. More or less, the client-side component (a) enables the DBA to administer the database and (b) supports server-side development.
The database catalog contains the metadata, or a master index to all the structures and algorithms that make a DBMS work. Database catalogs are implementation specific and therefore quite product specific. In the folding sections, you’ll cover the database engine and SQL, then the shell interface that provides you interactive and batch access to the database.
The Database Engine ↓
A description of the working components of a DBMS.
The DMBS subsystems manage Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), and Transaction Control Language (TCL) commands. Occasionally, SELECT statements (queries) are removed from the DML group and placed in a Data Query Language (DQL). The argument goes that a query that doesn’t lock data for transactions isn’t manipulating it, it is only viewing it. The argument against that logic is that queries manipulate data to look at results. While an interesting topic, most certification material treats a SELECT statement as a DML statement.
Data Manipulation Language (DML) ↓
This gives you a brief description of the SELECT, INSERT, DELETE, UPDATE, and MERGE statements. These statements let you query data and transact against data in the database.
- SELECT – Queries data.
- INSERT – Inserts data into tables or objects.
- UPDATE – Updates data in tables or objects.
- DELETE – Deletes data from tables or objects.
- MERGE – Inserts or updates data based on conditions within the statement into tables or objects. It is part of the ANSI SQL:2003 standard. MySQL implements it as the
ON DUPLICATE KEYclause, and you can find a comparison of the OracleMERGEto it in this blog post.
Data Definition Language (DDL) ↓
This gives you a brief description of the CREATE, ALTER, DROP, RENAME, COMMENT, and TRUNCATE statements. These statements let you create, modify, and remove structures from the database.
- CREATE – Creates an object, which can be a table, view, or other structure supported by the DBMS implementation.
- ALTER – Alters the structure of an object, which can be a table, view, or other structure supported by the DBMS implementation.
- DROP – Drops a structure from the database, which may be constrained when there are other objects with a dependency on the target object.
- RENAME – Renames a structure, which can be a table, view, or subcomponent like a column name or data type of a table.
- TRUNCATE – Truncate deletes data like the
DELETEstatement with a twist. It preserves the structure of the table but deletes the contents without logging the deletes for recovery. Implementations vary between products, and Oracle’s flashback technology does provide for recovery of truncated objects.
- COMMENT – Lets you add a comment about a table, or a comment about a column in a specific table.
Data Control Language (DCL) ↓
This gives you a brief description of the GRANT and REVOKE statements. These statements let you manage permissions to access and control database resources.
- GRANT – Grants a permission to user to access, transact against, or run an object in the database. In some implementations, like Oracle, you can package grants into sets known as roles, and grant roles. This minimizes the work in granting access but comes with some restrictions, which change from release to release.
- REVOKE – Revokes a grant made previously.
Transaction Control Language (TCL) ↓
This gives you a brief description of the COMMIT, ROLLBACK, and SAVEPOINT statements. They let you group collections of DML statements into cohesive units known as transactions. A sample page on how to manage transactions in Oracle, MySQL, and SQL Server shows you how to implement them.
- COMMIT – The commit makes permanent any changes that an individual makes while using DML commands. Individual changes before a
COMMIT;are visible to the individual making them but aren’t visible to others working in the database. This makes the DML command the first phase of a change, and theCOMMIT;statement the second phase of a change. This process is known as a Two-Phase Commit (2PC) model. It is a common paradigm in multiple-user systems that adhere to the ACID transaction principle. The acronym ACID stands for Atomic, Consistent, Isolated, and Durable. ACID compliant systems manage concurrent operations. This allows a series of DML statements to process as a transaction, which is no more than a related set of DML statements.
- ROLLBACK – Rolls back transactions made but uncommitted. You have three options with a
ROLLBACKstatement. You can roll back to the lastCOMMITstatement or a namedSAVEPOINTstatement, or you can roll back to the point where you connected to the system with a pessimistic connection (Internet socket). Sometimes products provide a similar function by maintaining a transactional and object-state management componenets to let you mimic an Internet socket across a browser connection (HTTP/HTTPS). Browser connections are typically optimistic because they don’t wait for confirmation of the event, only confirmation that the event has begun. They’re known as optimistic connections in that context.
- SAVEPOINT – A
SAVEPOINTlets you name a point inside a transaction, and effectively lets you break a transaction into subtransactions. If everything works throughout the transaction, yourCOMMITstatement commits the transaction and all substransactions. However, if there is a failure in a subsequent subtransaction, you can roll back to theSAVEPOINTwith the following command:
ROLLBACK savepoint_name; |
The Database Interface ↓
A description of the database interface, often called the database client.
The client-side interface is frequently called the Oracle client, MySQL client, et cetera. It is always deployed on the server-tier but can be deployed as a standalone application on a separate client- or middle-tier server. This type of deployment lets you connect to work on server-side components or administer the server from a remote machine.
The client-side application software provides the interface, and allows you to create an Internet socket between two computers. This type of connection is often labeled persistent or pessimistic. It is persistent because it’s like a traditional handshake, unbroken until the reply and acknowledgment transaction completes. It is pessimistic because you remain connected until you confirm the completion of an action, which is known as a transaction. The supporting Internet socket is often described as state-aware. This type of communication allows interactive and batch communication to lock data changes until you want to make them publicly available in a multiple-user system (all databases are designed to be multiple-user systems).
While transactions may involve inserting, updating, or deleting data from a table or view, they may also involve transactions against multiple tables or views in the database. The perspective of a change or multiple changes to several objects is known as a transaction. Individual table changes in the context of a series of changes may be considered subtransaction. Likewise, a transaction may be broken up into smaller pieces, which are known as subtransactions. Subtransactions may contain one or more DML actions.
Anytime you have a client-side application communicating across an Internet socket, you have what’s known as client-server computing or two-tier computing. The best definition I’ve run across for client-server computing is: “Two computers sharing resources across a network.” While you can implement this without TCP/IP, most applications implement it on top of TCP/IP.
Oracle® Client SQL*Plus ↓
This segment describes and illustrates the SQL*Plus environment that supports all command-line, Open Database Connectivity (ODBC), and Java Database Connectivity (JDBC) interactions.
The Oracle client is known as sqlplus, and it is an elaborate interactive and batch shell. sqlplus supports customizable environment variables and a set of very useful reporting tools. The complete documentation on the SQL*Plus environment can be found in Oracle SQL*Plus User’s Guide and Reference.
The basic architecture of the Oracle SQL*Plus environment is rather complex, which is why they’ve built such a competitive edge over the last two decades in the database market. The following drawing illustrates the flow of control for the Oracle SQL*Plus environment.
The ability to read and write external tables is not shown in the drawing and will be covered in Lesson 7 on Aggregation. It’s there because that’s where you learn how to create external tables, and import comma separated value files. Oracle also provides the ability to read and write proprietary files through a utility known as Oracle Data Pump, which is beyond the scope of the course.
There are two Oracle server-side components. They are the oracle daemon process that runs the RDBMS, and tnslsnr daemon process that runs the Oracle Listener.
You can find the documentation on the Oracle listener service in the Oracle Database Net Services Administrator’s Guide. The tnslsnr relies on the listener.ora server-side file to start and stop itself. The sqlplus executable uses the tnsnames.ora client- and server-side file to locate a database listener. The listener hears the sqlplus request and connects it to the oracle engine.
MySQL® Client mysql ↓
This segment describes and illustrates the SQL*Plus environment that supports all command-line, Open Database Connectivity (ODBC), and Java Database Connectivity (JDBC) interactions.
The MySQL client is known as mysql, and it is a very basic interactive and batch processing tool. It lets you run commands and display results and lets you split standard out (stdout) so that it goes both to the console and a file. It doesn’t do much more than that.
The following illustrates the mysql environment and includes the enhancement of a stored procedure language (introduced in MySQL 5.0). As you can tell, there are many features missing from the Oracle equivalent. Most notably the inability to support external libraries in C, C++, C#, et cetera, and the ability to read and write external files through stored procedures.
The ability to read external tables is shown in the drawing as file input and output, and it is covered in more depth in Lesson 7 on outer join operations and aggregation. It’s there because that’s where you learn how to create external tables, and import comma separated value files to a MySQL database.
Unlike Oracle, MySQL only has one server-side process – mysqld. mysqld is also a daemon running in the background. The mysql client-side application sends a request to the mysqld listener, which in turn dispatches it to the database engine.
Beyond the client-server model, you have web interfaces. These typically aren’t connected via an Internet socket in the same way. They typically work through an Apache server connection, which is pipelined and limited to a 15 second duration socket. That means what happens, happens quickly; and it may or may not succeed. This is an optimistic connection because you’ve no guarantee of behavior. When you process a DML statement, you must query to see it completed before you perform the next step. In the meantime, other users can see what you’ve done and alter it. This is fine if all you wanted to accomplish was change a single table but generally transactions include more than one DML statement and more than one table. Optimistic connections like these can’t lock data without elaborate application software written on the server to preserve state and lock data to mimic a 2PC protocol. The communication travels across what is called a stateless connection, often on an HTTP/HTTPS protocol. This model is a three-tier computing model. However, it’s generally more than three tiers because the number of components between the client and server tend to multiply. It is therefore more frequently called n-tier computing.
This is just a summary that leverages Web 2 by using links. The benefit of the summary increases as you explore the material on your own.


On your list of items in the DDL, at the top you do not have TRUCATE included, but then when you expand it it does include TRUNCATE.
John Draper
5 Apr 10 at 12:20 pm
I’ve put it in the prefacing paragraph.
michaelmclaughlin
5 Apr 10 at 7:53 pm
The link for pointers, appears to not go anywhere.
spencer
10 Jan 11 at 4:55 pm
It does go where it should go now.
michaelmclaughlin
16 Mar 11 at 2:26 pm