1 – Intro to Databases
Modern database management systems are constantly evolving because the market demands new features. Current trends in database technology focus on two areas – Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) databases.
An OLTP database supports what we know as the “day-to-day” tasks of business. OLTP database support highly concurrent systems that add, change, and delete data. An OLAP database supports the underlying data that Data Analyst use to make decisions. Business Intelligence (BI) describes the data analysts’ world, and BI relies on OLAP databases to support and house data. OLAP supports concurrent systems that only query data.
An OLAP database is a data warehouse when it contains large amounts of data to support a variety of decision-making. An OLAP database is a data mart when it contains a highly specialized data to help identify specific decision-making.
Day-to-day business operations typically run on relational database management systems (RDBMS). That’s because they’re designed to support high concurrency with and frequent data changes. Data changes come from INSERT
, UPDATE
, and DELETE
statements, which respectively add, change, and remove data from a database.
The RDBMS is not the only solution available. Some companies choose to implement distributed systems, like Cassandra. Other companies choose to implement document-oriented databases, like MongoDB. You can describe a document-centered database as a content management database.
You also can implement an XML databases as a content management database. Mark Logic is an XML database. XML databases offer highly concurrent and query-only databases. They are effective alternatives to traditional OLAP solutions. XML databases are ideally suited solutions when we don’t know what we want to discover because they use XPath to find new relationships between data.
A database management system is generally suited to a single purpose. Unfortunately, sometimes organizations don’t have enough money to fund separate OLTP and OLAP databases. These companies often implement an RDBMS database to handle both the OLTP and OLAP needs. The database generally under performs its OLTP to deliver a qualified OLAP system and vice versa.
History of Database Management Systems (DBMS):
Databases were once no more than a collection of files. Each file contained something unique, like a single subject (but not always). The expense of managing different file structures led to the development of a Hierarchical Linear Model of storing information, often called a hierarchical database (very much like an XML structure). An inverted node tree aptly describes a hierarchical database.
Hierarchical databases work best when you compare data between sibling (side-by-side) nodes. The cost of comparing data between nodes becomes more expensive as they become farther apart. The collection of programs that manage a Hierarchical database are called a Database Management System (DBMS).
The problem with hierarchical databases is that the organization of information never quite represents how users need to access the data. That means some queries cost much more than others when the nodes are far apart. While everyone likes to think their design is correct, they most often only represent how you add, update, and remove data. A design that focus on adding, updating, and removing data is a transactional design.
Queries work differently than transactions because they assemble data from multiple nodes. Users also access data differently because they have different perspectives about what’s important. The differences in how they see data leads to different ways they consume it, much like preferences in food lead to eating different meals. Users of data are consumers.
These consumers are known as knowledge workers thanks to the seminal work of Alvin Toffler – PowerShift: Knowledge, Wealth, and Violence at the Edge of the 21st Century. It covers his Third Wave or his third take on the future. You may be more familiar with its repackaged and economic focus in The World is Flat by Thomas Friedman.
The Networked Model replaced the hierarchical model by introducing pointers. Pointers allow an optimized join between a node in one part of the hierarchy with another. These pointers redirected the search tree when matching data wasn’t found in sibling nodes of a hierarchy.
The Relational Model abstracted these pointers into indexes (a list of available nodes) and put them in the data catalog as a list of tables in a Relational Database Management System. This list of tables allows SQL to create relationship through matching values (equijoin) from one column with values in another column. It also can compare a value of one column against a pair of range values (non-equijoin) by using the BETWEEN
operator. A non-equijoin also applies to comparisons with inequality operators.
The object model was proposed as a replacement to the relational model because it would group data into related and meaningful human objects. The problem with the object model was the database had to store the data between actions in a serialized (or collapsed) format. These databases were and are highly memory intensive and therefore don’t scale as well as the relational model.
Oracle had committed to the object model but back peddled when they found it impractical. They developed an extended relational model, which quickly was renamed to the object relational model. This model only marshals (places into memory) those objects required at run time, and is the cornerstone of the Object Relational Database Management System (ORDBMS).
The basic architecture of these products is simple, while the implementation details can take months and years to master as a Database Administrator (DBA). They work on the concept of a server engine and client interface. The engine is known as a Database Management System (DBMS), and it generally has subsystems that run Structured Query Language (SQL) and a database catalog.
SQL is actually a shortened form of IBM’s SEQUEL (Structured English Query Language), and is generally accepted as meaning Structured Query Language. Alan Beaulieu makes the point that SQL really doesn’t “stand for anything” (Learning SQL), and while he may be technically right more people now understand it as such. Therefore, I’d accept its meaning as is because it simply makes sense to do so.
The interface is a typical client-side interactive and batch programming interface, and it lets you access the engine. Both the client-side interface and DBMS engine are deployed on each server. More or less, the client-side component (a) enables the DBA to administer the database and (b) supports server-side development.
The database catalog contains the metadata, or a master index to all the structures and algorithms that make a DBMS work. Database catalogs are implementation specific and therefore quite product specific. In the folding sections, you’ll cover the database engine and SQL, then the shell interface that provides you interactive and batch access to the database.
The Database Engine →
A description of the working components of a DBMS.
The DBMS subsystems manage Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), and Transaction Control Language (TCL) commands. Occasionally, SELECT
statements (queries) are removed from the DML group and placed in a Data Query Language (DQL). The argument goes that a query that doesn’t lock data for transactions isn’t manipulating it, it is only viewing it. The argument against that logic is that queries manipulate data to look at results. While an interesting topic, most certification material treats a SELECT
statement as a DML statement.
Data Manipulation Language (DML) →
This gives you a brief description of the SELECT
, INSERT
, DELETE
, UPDATE
, and MERGE
statements. These statements let you query data and transact against data in the database.
- SELECT – Queries data.
- INSERT – Inserts data into tables or objects.
- UPDATE – Updates data in tables or objects.
- DELETE – Deletes data from tables or objects.
- MERGE – Inserts or updates data based on conditions within the statement into tables or objects. It is part of the ANSI SQL:2003 standard. MySQL implements it as the
ON DUPLICATE KEY
clause, and you can find a comparison of the OracleMERGE
to it in this blog post.
Data Definition Language (DDL) →
This gives you a brief description of the CREATE
, ALTER
, DROP
, RENAME
, COMMENT
, and TRUNCATE
statements. These statements let you create, modify, and remove structures from the database.
- CREATE – Creates an object, which can be a table, view, or other structure supported by the DBMS implementation.
- ALTER – Alters the structure of an object, which can be a table, view, or other structure supported by the DBMS implementation.
- DROP – Drops a structure from the database, which may be constrained when there are other objects with a dependency on the target object.
- RENAME – Renames a structure, which can be a table, view, or subcomponent like a column name or data type of a table.
- TRUNCATE – Truncate deletes data like the
DELETE
statement with a twist. It preserves the structure of the table but deletes the contents without logging the deletes for recovery. Implementations vary between products, and Oracle’s flashback technology does provide for recovery of truncated objects.
- COMMENT – Lets you add a comment about a table, or a comment about a column in a specific table.
Data Control Language (DCL) →
This gives you a brief description of the GRANT
and REVOKE
statements. These statements let you manage permissions to access and control database resources.
- GRANT – Grants a permission to user to access, transact against, or run an object in the database. In some implementations, like Oracle, you can package grants into sets known as roles, and grant roles. This minimizes the work in granting access but comes with some restrictions, which change from release to release.
- REVOKE – Revokes a grant made previously.
Transaction Control Language (TCL) →
This gives you a brief description of the COMMIT
, ROLLBACK
, and SAVEPOINT
statements. They let you group collections of DML statements into cohesive units known as transactions. A sample page on how to manage transactions in Oracle, MySQL, and SQL Server shows you how to implement them.
- COMMIT – The commit makes permanent any changes that an individual makes while using DML commands. Individual changes before a
COMMIT;
are visible to the individual making them but aren’t visible to others working in the database. This makes the DML command the first phase of a change, and theCOMMIT;
statement the second phase of a change. This process is known as a Two-Phase Commit (2PC) model. It is a common paradigm in multiple-user systems that adhere to the ACID transaction principle. The acronym ACID stands for Atomic, Consistent, Isolated, and Durable. ACID compliant systems manage concurrent operations. This allows a series of DML statements to process as a transaction, which is no more than a related set of DML statements.
- ROLLBACK – Rolls back transactions made but uncommitted. You have three options with a
ROLLBACK
statement. You can roll back to the lastCOMMIT
statement or a namedSAVEPOINT
statement, or you can roll back to the point where you connected to the system with a pessimistic connection (Internet socket). Sometimes products provide a similar function by maintaining a transactional and object-state management componenets to let you mimic an Internet socket across a browser connection (HTTP/HTTPS). Browser connections are typically optimistic because they don’t wait for confirmation of the event, only confirmation that the event has begun. They’re known as optimistic connections in that context.
- SAVEPOINT – A
SAVEPOINT
lets you name a point inside a transaction, and effectively lets you break a transaction into subtransactions. If everything works throughout the transaction, yourCOMMIT
statement commits the transaction and all substransactions. However, if there is a failure in a subsequent subtransaction, you can roll back to theSAVEPOINT
with the following command:
ROLLBACK savepoint_name; |
The Database Interface →
A description of the database interface, often called the database client.
The client-side interface is frequently called the Oracle client, MySQL client, et cetera. It is always deployed on the server-tier but can be deployed as a standalone application on a separate client- or middle-tier server. This type of deployment lets you connect to work on server-side components or administer the server from a remote machine.
The client-side application software provides the interface, and allows you to create an Internet socket between two computers. This type of connection is often labeled persistent or pessimistic. It is persistent because it’s like a traditional handshake, unbroken until the reply and acknowledgment transaction completes. It is pessimistic because you remain connected until you confirm the completion of an action, which is known as a transaction. The supporting Internet socket is often described as state-aware. This type of communication allows interactive and batch communication to lock data changes until you want to make them publicly available in a multiple-user system (all databases are designed to be multiple-user systems).
While transactions may involve inserting, updating, or deleting data from a table or view, they may also involve transactions against multiple tables or views in the database. The perspective of a change or multiple changes to several objects is known as a transaction. Individual table changes in the context of a series of changes may be considered subtransaction. Likewise, a transaction may be broken up into smaller pieces, which are known as subtransactions. Subtransactions may contain one or more DML actions.
Anytime you have a client-side application communicating across an Internet socket, you have what’s known as client-server computing or two-tier computing. The best definition I’ve run across for client-server computing is: “Two computers sharing resources across a network.” While you can implement this without TCP/IP, most applications implement it on top of TCP/IP.
Oracle® Client SQL*Plus →
This segment describes and illustrates the SQL*Plus environment that supports all command-line, Open Database Connectivity (ODBC), and Java Database Connectivity (JDBC) interactions.
The Oracle client is known as sqlplus
, and it is an elaborate interactive and batch shell. sqlplus
supports customizable environment variables and a set of very useful reporting tools. The complete documentation on the SQL*Plus environment can be found in Oracle SQL*Plus User’s Guide and Reference.
The basic architecture of the Oracle SQL*Plus environment is rather complex, which is why they’ve built such a competitive edge over the last two decades in the database market. The following drawing illustrates the flow of control for the Oracle SQL*Plus environment.
The ability to read and write external tables is not shown in the drawing and will be covered in Lesson 7 on Aggregation. It’s there because that’s where you learn how to create external tables, and import comma separated value files. Oracle also provides the ability to read and write proprietary files through a utility known as Oracle Data Pump, which is beyond the scope of the course.
There are two Oracle server-side components. They are the oracle
daemon process that runs the RDBMS, and tnslsnr
daemon process that runs the Oracle Listener.
You can find the documentation on the Oracle listener service in the Oracle Database Net Services Administrator’s Guide. The tnslsnr
relies on the listener.ora
server-side file to start and stop itself. The sqlplus
executable uses the tnsnames.ora
client- and server-side file to locate a database listener. The listener hears the sqlplus
request and connects it to the oracle
engine.
MySQL® Client mysql
→
This segment describes and illustrates the SQL*Plus environment that supports all command-line, Open Database Connectivity (ODBC), and Java Database Connectivity (JDBC) interactions.
The MySQL client is known as mysql
, and it is a very basic interactive and batch processing tool. It lets you run commands and display results and lets you split standard out (stdout) so that it goes both to the console and a file. It doesn’t do much more than that.
The following illustrates the mysql
environment and includes the enhancement of a stored procedure language (introduced in MySQL 5.0). As you can tell, there are many features missing from the Oracle equivalent. Most notably the inability to support external libraries in C, C++, C#, et cetera, and the ability to read and write external files through stored procedures.
The ability to read external tables is shown in the drawing as file input and output, and it is covered in more depth in Lesson 7 on outer join operations and aggregation. It’s there because that’s where you learn how to create external tables, and import comma separated value files to a MySQL database.
Unlike Oracle, MySQL only has one server-side process – mysqld
. mysqld
is also a daemon running in the background. The mysql
client-side application sends a request to the mysqld
listener, which in turn dispatches it to the database engine.
Beyond the client-server model, you have web interfaces. These typically aren’t connected via an Internet socket in the same way. They typically work through an Apache server connection, which is pipelined and limited to a 15 second duration socket. That means what happens, happens quickly; and it may or may not succeed. This is an optimistic connection because you’ve no guarantee of behavior. When you process a DML statement, you must query to see it completed before you perform the next step. In the meantime, other users can see what you’ve done and alter it. This is fine if all you wanted to accomplish was change a single table but generally transactions include more than one DML statement and more than one table. Optimistic connections like these can’t lock data without elaborate application software written on the server to preserve state and lock data to mimic a 2PC protocol. The communication travels across what is called a stateless connection, often on an HTTP/HTTPS protocol. This model is a three-tier computing model. However, it’s generally more than three tiers because the number of components between the client and server tend to multiply. It is therefore more frequently called n-tier computing.
This is just a summary that leverages Web 2 by using links. The benefit of the summary increases as you explore the material on your own.
On your list of items in the DDL, at the top you do not have TRUCATE included, but then when you expand it it does include TRUNCATE.
John Draper
5 Apr 10 at 12:20 pm
I’ve put it in the prefacing paragraph.
michaelmclaughlin
5 Apr 10 at 7:53 pm
The link for pointers, appears to not go anywhere.
spencer
10 Jan 11 at 4:55 pm
It does go where it should go now.
michaelmclaughlin
16 Mar 11 at 2:26 pm
Under “The Database Engine” it says DMBS, but do you mean DBMS or am I mistaken?
Craig W. Christensen
7 Jan 12 at 9:41 am
Juxtaposed seems an odd word in context here. It means to place (on purpose) to show comparison and difference. It seems you mean that they are dissimilar or have a hard time with synergism.
Joe Mller
9 Jan 12 at 12:31 pm
in the first paragraph of the database engine roll-down, i think the acronym wants to be ‘DBMS’ not ‘DMBS’ subsystem.
james
23 Apr 12 at 10:43 pm
You’ve got a typo. Right at the top of the “Database Engine” section:
“A description of the working components of a DBMS.
The DMBS subsystems…”
I believe you meant “DBMS” both times, yes?
Brandon MacGregor
11 Jun 12 at 10:20 am
The link to Oracle SQL*Plus User’s Guide and Reference is broken
Willis Barton
1 May 13 at 2:29 pm
Willis, I’ve fixed the broken link.
michaelmclaughlin
18 Mar 14 at 11:15 am
Brandon, yes. It’s fixed.
michaelmclaughlin
22 Oct 14 at 1:17 am
Joe, I concur and changed juxtaposed to quite the opposite.
michaelmclaughlin
22 Oct 14 at 1:38 am
Third paragraph, the way it’s written is confusing and makes me think there is an error. Both sentences start with “An OLAP” and I think one is supposed to start with “An OLTP”
TJ
6 Jan 16 at 5:08 pm
There are actually two principal uses of OLAP databases. The OLTP case was defined in the second paragraph.
michaelmclaughlin
20 Jan 16 at 12:05 am
the link for the oltp is broken, but here’s the new page location https://en.wikipedia.org/wiki/Online_transaction_processing
Dennis Paulsen
14 Sep 17 at 10:37 am