Database Tutorial

Course Tutorial Site

Site Admin

1 – Intro to Databases

with 15 comments

Modern database management systems are constantly evolving because the market demands new features. Current trends in database technology focus on two areas – Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) databases.

An OLTP database supports what we know as the “day-to-day” tasks of business. OLTP database support highly concurrent systems that add, change, and delete data. An OLAP database supports the underlying data that Data Analyst use to make decisions. Business Intelligence (BI) describes the data analysts’ world, and BI relies on OLAP databases to support and house data. OLAP supports concurrent systems that only query data.

An OLAP database is a data warehouse when it contains large amounts of data to support a variety of decision-making. An OLAP database is a data mart when it contains a highly specialized data to help identify specific decision-making.

Day-to-day business operations typically run on relational database management systems (RDBMS). That’s because they’re designed to support high concurrency with and frequent data changes. Data changes come from INSERT, UPDATE, and DELETE statements, which respectively add, change, and remove data from a database.

The RDBMS is not the only solution available. Some companies choose to implement distributed systems, like Cassandra. Other companies choose to implement document-oriented databases, like MongoDB. You can describe a document-centered database as a content management database.

You also can implement an XML databases as a content management database. Mark Logic is an XML database. XML databases offer highly concurrent and query-only databases. They are effective alternatives to traditional OLAP solutions. XML databases are ideally suited solutions when we don’t know what we want to discover because they use XPath to find new relationships between data.

A database management system is generally suited to a single purpose. Unfortunately, sometimes organizations don’t have enough money to fund separate OLTP and OLAP databases. These companies often implement an RDBMS database to handle both the OLTP and OLAP needs. The database generally under performs its OLTP to deliver a qualified OLAP system and vice versa.

History of Database Management Systems (DBMS):

Databases were once no more than a collection of files. Each file contained something unique, like a single subject (but not always). The expense of managing different file structures led to the development of a Hierarchical Linear Model of storing information, often called a hierarchical database (very much like an XML structure). An inverted node tree aptly describes a hierarchical database.

Hierarchical databases work best when you compare data between sibling (side-by-side) nodes. The cost of comparing data between nodes becomes more expensive as they become farther apart. The collection of programs that manage a Hierarchical database are called a Database Management System (DBMS).

The problem with hierarchical databases is that the organization of information never quite represents how users need to access the data. That means some queries cost much more than others when the nodes are far apart. While everyone likes to think their design is correct, they most often only represent how you add, update, and remove data. A design that focus on adding, updating, and removing data is a transactional design.

Queries work differently than transactions because they assemble data from multiple nodes. Users also access data differently because they have different perspectives about what’s important. The differences in how they see data leads to different ways they consume it, much like preferences in food lead to eating different meals. Users of data are consumers.

These consumers are known as knowledge workers thanks to the seminal work of Alvin Toffler – PowerShift: Knowledge, Wealth, and Violence at the Edge of the 21st Century. It covers his Third Wave or his third take on the future. You may be more familiar with its repackaged and economic focus in The World is Flat by Thomas Friedman.

The Networked Model replaced the hierarchical model by introducing pointers. Pointers allow an optimized join between a node in one part of the hierarchy with another. These pointers redirected the search tree when matching data wasn’t found in sibling nodes of a hierarchy.

The Relational Model abstracted these pointers into indexes (a list of available nodes) and put them in the data catalog as a list of tables in a Relational Database Management System. This list of tables allows SQL to create relationship through matching values (equijoin) from one column with values in another column. It also can compare a value of one column against a pair of range values (non-equijoin) by using the BETWEEN operator. A non-equijoin also applies to comparisons with inequality operators.

The object model was proposed as a replacement to the relational model because it would group data into related and meaningful human objects. The problem with the object model was the database had to store the data between actions in a serialized (or collapsed) format. These databases were and are highly memory intensive and therefore don’t scale as well as the relational model.

Oracle had committed to the object model but back peddled when they found it impractical. They developed an extended relational model, which quickly was renamed to the object relational model. This model only marshals (places into memory) those objects required at run time, and is the cornerstone of the Object Relational Database Management System (ORDBMS).

The basic architecture of these products is simple, while the implementation details can take months and years to master as a Database Administrator (DBA). They work on the concept of a server engine and client interface. The engine is known as a Database Management System (DBMS), and it generally has subsystems that run Structured Query Language (SQL) and a database catalog.

SQL is actually a shortened form of IBM’s SEQUEL (Structured English Query Language), and is generally accepted as meaning Structured Query Language. Alan Beaulieu makes the point that SQL really doesn’t “stand for anything” (Learning SQL), and while he may be technically right more people now understand it as such. Therefore, I’d accept its meaning as is because it simply makes sense to do so.

The interface is a typical client-side interactive and batch programming interface, and it lets you access the engine. Both the client-side interface and DBMS engine are deployed on each server. More or less, the client-side component (a) enables the DBA to administer the database and (b) supports server-side development.

The database catalog contains the metadata, or a master index to all the structures and algorithms that make a DBMS work. Database catalogs are implementation specific and therefore quite product specific. In the folding sections, you’ll cover the database engine and SQL, then the shell interface that provides you interactive and batch access to the database.

This is just a summary that leverages Web 2 by using links. The benefit of the summary increases as you explore the material on your own.

Written by michaelmclaughlin

September 19th, 2009 at 11:50 pm

Posted in

15 Responses to '1 – Intro to Databases'

Subscribe to comments with RSS or TrackBack to '1 – Intro to Databases'.

  1. On your list of items in the DDL, at the top you do not have TRUCATE included, but then when you expand it it does include TRUNCATE.

    John Draper

    5 Apr 10 at 12:20 pm

  2. I’ve put it in the prefacing paragraph.

    michaelmclaughlin

    5 Apr 10 at 7:53 pm

  3. The link for pointers, appears to not go anywhere.

    spencer

    10 Jan 11 at 4:55 pm

  4. It does go where it should go now.

    michaelmclaughlin

    16 Mar 11 at 2:26 pm

  5. Under “The Database Engine” it says DMBS, but do you mean DBMS or am I mistaken?

    Craig W. Christensen

    7 Jan 12 at 9:41 am

  6. Juxtaposed seems an odd word in context here. It means to place (on purpose) to show comparison and difference. It seems you mean that they are dissimilar or have a hard time with synergism.

    Joe Mller

    9 Jan 12 at 12:31 pm

  7. in the first paragraph of the database engine roll-down, i think the acronym wants to be ‘DBMS’ not ‘DMBS’ subsystem.

    james

    23 Apr 12 at 10:43 pm

  8. You’ve got a typo. Right at the top of the “Database Engine” section:

    “A description of the working components of a DBMS.
    The DMBS subsystems…”

    I believe you meant “DBMS” both times, yes?

    Brandon MacGregor

    11 Jun 12 at 10:20 am

  9. The link to Oracle SQL*Plus User’s Guide and Reference is broken

    Willis Barton

    1 May 13 at 2:29 pm

  10. Willis, I’ve fixed the broken link.

    michaelmclaughlin

    18 Mar 14 at 11:15 am

  11. Brandon, yes. It’s fixed.

    michaelmclaughlin

    22 Oct 14 at 1:17 am

  12. Joe, I concur and changed juxtaposed to quite the opposite.

    michaelmclaughlin

    22 Oct 14 at 1:38 am

  13. Third paragraph, the way it’s written is confusing and makes me think there is an error. Both sentences start with “An OLAP” and I think one is supposed to start with “An OLTP”

    TJ

    6 Jan 16 at 5:08 pm

  14. There are actually two principal uses of OLAP databases. The OLTP case was defined in the second paragraph.

    michaelmclaughlin

    20 Jan 16 at 12:05 am

  15. the link for the oltp is broken, but here’s the new page location https://en.wikipedia.org/wiki/Online_transaction_processing

    Dennis Paulsen

    14 Sep 17 at 10:37 am

Leave a Reply