SQL Joins
The following wikipedia article covers SQL joins:
The strength of the relational model and relational databases is the ability to join data from different tables and views. This explains the various join possibilities.
Definitions
Before jumping into the SQL join statements, setting up some basic information seems like a good idea. Here are some basic definitions:
SQL Join Concepts →
This describes the basics concepts about SQL joins.
- A table is an array of arrays, which means a row in any table contains an array. The definition of a table defines the data structure of a row or array.
- Selection is the process of identifying, accessing, and joining data into return data sets. The basic elements of selection in a SQL statement are the
FROM
andWHERE
clauses. -
- The ANSI 89 syntax uses a cross join or comma-delimited list of tables, and places the join in the
WHERE
clause. It doesn’t support outer joins but Oracle’s Proprietary syntax extends ANSI 89 and provides left and right outer joins but not full outer joins, unless you combine results with a union set operator. - The ANSI 92 syntax uses a set of key words like
INNER JOIN
,LEFT JOIN
, et cetera, and puts the join in anON
orUSING
clause, which are often referred to as subclauses. It supports left, right, and full outer joins.
- The ANSI 89 syntax uses a cross join or comma-delimited list of tables, and places the join in the
-
- Projection is the process of paring columns from filtered data sets. Projection is determined by the list of columns found in a
SELECT
clause.
- Projection is the process of paring columns from filtered data sets. Projection is determined by the list of columns found in a
SQL JOIN Logic: →
This describes the types of joins in context to how and what they match. It describes how two sets of rows are joined together through value matches and range/filtered comparisons, which are known respectively as equijoins and non-equijoins.
- A SQL join without a relationship creates what’s known as a Cartesian Product, which is a result set that consists of all rows from one table matched with all row in another table. This type of join is also known as a
CROSS JOIN
. - A SQL join with a relationship may be one of two types:
-
- A SQL equijoin (also known as a natural or semijoin) creates a relationship between two tables based on a comparison of values found in one or a set of columns in one table and one or an equal set of columns in another table.
- A SQL non-equijoin (also known as a θ-join) effects a relation between two tables based on a filtered
CROSS JOIN
between two tables. This type of join can be a range comparison using theBETWEEN
operator or a comparison of column values that uses an inequality operator.
- A SQL antijoin may be one of two things, at least typically, and they are:
-
- A relative complement, which is everything in another table not found in a table.
- A symmetric difference, which is the relative complement of both tables in a join.
A sample equijoin, or value match between two sets is shown in the illustration below. The values of the primary key match the values of the foreign key typcially but in this example you’re working with ordinal numbers in different languages. Therefore, the value match is between two primary keys.
The example highlights an inner join by highlighting the match in a yellow overlay. The rows left without highlighting are excluded from the join. The excluded rows in one or the other table would be the relative complement in an outer join, and taken together both relative complements are the symmetric difference of the join. The symmetric difference is an anti-join because it excludes anything that the two sets would match on.
Unlike a Venn Diagram, this illustration exhibits the behavior of database tables in an equijoin. A column or series of columns constitutes the JOIN
or intersection between the two sets and a WHERE
clause can filter the rows returned based on a value or range comparison. Both of these processes serve to filter the data and collectively are known as the selection process. Once you have selected the data, you can have access to the rest of the columns in a the nested set or row of a table. You project through the data when you filter the columns to display only some of them, which is known as the projection process.
There are three types of typical joins. They are nested loop, or a variation known as block nested loop, merged join, and hash join. Nested loops are exhaustive searches and don’t require presorting of keys. Merged joins rely on presorting of keys and that’s why cost-based optimizers collect data to improve join speed. Hash joins are more complex and limited to only equijoin resolutions.
Table Name: ENGLISH | |
---|---|
English_ID | English_Text |
Table Name: FRENCH | |
---|---|
French_ID | French_Text |
RESULT SET: | ||||
---|---|---|---|---|
ID | English_ID | English_Text | French_ID | French_Text |
SQL JOIN Implementations →
This describes the six basics join patterns and the anti-join pattern. It provides a summary of join types.
- A
CROSS JOIN
implements a Cartesian Product, which returns rows combined from both tables. It matches every row in one table against every row in another table. For example, if you have 3 rows in one table and 2 rows in another you’ll have 2 times 3, or 6, rows in the result set. - An
INNER JOIN
implements an equijoin, which returns rows combined from both tables when the values in the join condition match. Join conditions can include one or more columns. - A
NATURAL JOIN
implements an equijoin, which returns rows combined from both tables when the values in the join condition match. While the join conditions can include one or more columns, you have no choice in selecting the join columns. ANATURAL JOIN
checks the definition of both tables (in the data catalog or metadata) and looks for like named columns. It then joins the table based on the values of all like named columns from both tables. One might say it’s an unnatural behavior because a developer has no way to override it. I’d concur with calling it unnatural and for that matter wrote much more about it in this older post. - A
LEFT JOIN
implements an equijoin and relative complement, which returns rows combined from both tables when the values in the join condition match and the rows in the left table that aren’t found in the right table. This makes aLEFT JOIN
a combination of anINNER JOIN
and a relative complement of the table on the right of the join. This nuance becomes important when we examine the(+)
semantic implemented by Oracle. - A
RIGHT JOIN
implements an equijoin and relative complement, which returns rows combined from both tables when the values in the join condition match and the rows in the right table that aren’t found in the left table. This makes aRIGHT JOIN
the opposite of aLEFT JOIN
. Naturally, it is also a combination of anINNER JOIN
and a relative complement of the left table. Like theLEFT JOIN
, this nuance becomes important when we examine the(+)
semantic implemented by Oracle. - A
FULL JOIN
implements an equijoin and both relative complements, which returns rows combined from both tables when the values in the join condition match and the rows in both the left and right tables that aren’t found in their respective other tables. This makes aFULL JOIN
a combination of anINNER JOIN
and relative complements of the both tables. This type of join can’t be done with the(+)
alone in the older syntax. It requires left and right outer queries glued together by aUNION
set operator. The UNION set operator eliminates duplicate from the intersection of the two tables, which are created by gluing together two outer joins. - An Anti-join implements an outer join minus an equijoin, which returns a relative complement when the first query is a
LEFT JOIN
orRIGHT JOIN
and a symmetric difference when the first query is aFULL JOIN
.
JOIN Inheritance Tree →
Joins between tables are nothing more than algorithms that match values between two sets, like a nested for loop. The idea of matching everything in one table with everything in another is the most generalized behavior, which is known as a CROSS JOIN
in SQL and a Cartesian Join in mathematics. The rest of the joins are specialization of the CROSS JOIN
, and this section contains an abstract UML diagram showing that inheritance relationship.
You can read more about the Unified Modeling Language (UML) before reading this but the idea is that a INNER JOIN
is a specialization of a CROSS JOIN
, which means it does everything the CROSS JOIN
does and more. Each speicalization provides more to the joining of sets of data.
The diagram is borrowed from the SQL Primer (Appendix B) in my Oracle Database 11g PL/SQL Programming book, and duly footnoted.
Setup Script →
The drop down items, unfold with source code to seed the examples.
Oracle SQL Join Supporting Script →
This has the setup script for the example tables.
-- ------------------------------------------------------------------ -- This part seeds joins, except the range non-equijoin section. -- ------------------------------------------------------------------ -- Conditionally drop tables when they're found. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('ORDINAL_ENGLISH' ,'ORDINAL_FRENCH' ,'ORDINAL_OLD_ENGLISH')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; END; / -- Create tables. CREATE TABLE ordinal_english ( ordinal_english_id NUMBER , ordinal_english_text VARCHAR2(26)); CREATE TABLE ordinal_french ( ordinal_french_id NUMBER , ordinal_french_text VARCHAR2(26)); CREATE TABLE ordinal_old_english ( ordinal_english_id NUMBER , ordinal_old_english_text VARCHAR2(26)); -- Seed tables. DECLARE -- Define local collection types. TYPE number_list IS TABLE OF NUMBER; TYPE string_list IS TABLE OF VARCHAR2(26); -- Declare variables. ordinal_left NUMBER_LIST := number_list(1,2,3,4,5,6); ordinal_right NUMBER_LIST := number_list(1,3,4,5,6,7,8); english_text1 STRING_LIST := string_list('One','Two','Three','Four','Five','Six'); english_text2 STRING_LIST := string_list('Fyrmest','Pridda','Feorda','Fifta','Siexta','Scofoda','Eahtoda'); french_text STRING_LIST := string_list('Un','Trois','Quatre','Cinq','Six','Sept','Huit'); BEGIN FOR i IN 1..ordinal_left.COUNT LOOP INSERT INTO ordinal_english VALUES (ordinal_left(i),english_text1(i)); END LOOP; FOR i IN 1..ordinal_right.COUNT LOOP INSERT INTO ordinal_french VALUES (ordinal_right(i),french_text(i)); INSERT INTO ordinal_old_english VALUES (ordinal_right(i),english_text2(i)); END LOOP; -- Commit the work. COMMIT; END; / -- ------------------------------------------------------------------ -- This part seeds for the self referencing query section. -- ------------------------------------------------------------------ -- Conditionally drop objects. BEGIN FOR i IN (SELECT NULL FROM user_tables WHERE TABLE_NAME = 'SYSTEM_USER') LOOP EXECUTE IMMEDIATE 'DROP TABLE system_user CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT NULL FROM user_sequences WHERE sequence_name = 'SYSTEM_USER_S1') LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE system_user_s1'; END LOOP; END; / -- Create SYSTEM_USER table and sequence and seed data. CREATE TABLE system_user ( system_user_id NUMBER CONSTRAINT pk_system_user1 PRIMARY KEY , system_user_name VARCHAR2(20) CONSTRAINT nn_system_user1 NOT NULL , system_user_group_id NUMBER CONSTRAINT nn_system_user2 NOT NULL , system_user_type NUMBER CONSTRAINT nn_system_user3 NOT NULL , last_name VARCHAR2(20) , first_name VARCHAR2(20) , middle_initial VARCHAR2(1) , created_by NUMBER CONSTRAINT nn_system_user4 NOT NULL , creation_date DATE CONSTRAINT nn_system_user5 NOT NULL , last_updated_by NUMBER CONSTRAINT nn_system_user6 NOT NULL , last_update_date DATE CONSTRAINT nn_system_user7 NOT NULL , CONSTRAINT fk_system_user1 FOREIGN KEY(created_by) REFERENCES system_user(system_user_id) , CONSTRAINT fk_system_user2 FOREIGN KEY(last_updated_by) REFERENCES system_user(system_user_id)); INSERT INTO system_user VALUES ( 1,'SYSADMIN', 1, 1, '', '', '', 1, SYSDATE, 1, SYSDATE); INSERT INTO system_user VALUES ( 2,'DBA', 2, 1,'Adams','Samuel', '', 1, SYSDATE, 2, SYSDATE); INSERT INTO system_user VALUES ( 3,'DBA', 2, 1,'Henry','Patrick', '', 1, SYSDATE, 1, SYSDATE); INSERT INTO system_user VALUES ( 4,'DBA', 2, 1,'Puri','Manmohan', '', 1, SYSDATE, 3, SYSDATE); INSERT INTO system_user VALUES ( 5,'APPS DBA', 2, 1,'Bremen','Gunther', '', 4, SYSDATE, 4, SYSDATE); -- Create the constraint to leave room for seeding new application default values. CREATE SEQUENCE system_user_s1 START WITH 1001; -- ------------------------------------------------------------------ -- This part seeds the range non-equijoin section. -- ------------------------------------------------------------------ -- Conditionally drop tables when they're found. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('CALENDAR_JOIN' ,'TRANSACTION_JOIN')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; END; / -- Conditionally drop sequences when they're found. BEGIN FOR i IN (SELECT sequence_name FROM user_sequences WHERE sequence_name IN ('CALENDAR_JOIN_S1' ,'TRANSACTION_JOIN_S1')) LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name; END LOOP; END; / -- Create tables and sequences. CREATE TABLE calendar_join ( calendar_id NUMBER CONSTRAINT pk_calendar1 PRIMARY KEY , month_short_name VARCHAR2(3) CONSTRAINT nn_calendar1 NOT NULL , month_long_name VARCHAR2(10) CONSTRAINT nn_calendar2 NOT NULL , start_date DATE CONSTRAINT nn_calendar3 NOT NULL , end_date DATE CONSTRAINT nn_calendar4 NOT NULL); CREATE SEQUENCE calendar_join_s1; CREATE TABLE transaction_join ( transaction_id NUMBER CONSTRAINT pk_transaction1 PRIMARY KEY , transaction_amount NUMBER CONSTRAINT nn_transaction1 NOT NULL , transaction_date DATE CONSTRAINT nn_transaction2 NOT NULL); CREATE SEQUENCE transaction_join_s1; -- Seed the CALENDAR table. INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'JAN','January','01-JAN-09','31-JAN-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'FEB','February','01-FEB-09','28-FEB-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'MAR','March','01-MAR-09','31-MAR-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'APR','April','01-APR-09','30-APR-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'MAY','May','01-MAY-09','31-MAY-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'JUN','June','01-JUN-09','30-JUN-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'JUL','July','01-JUL-09','31-JUL-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'AUG','August','01-AUG-09','31-AUG-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'SEP','September','01-SEP-09','30-SEP-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'OCT','October','01-OCT-09','31-OCT-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'NOV','November','01-NOV-09','30-NOV-09'); INSERT INTO calendar_join VALUES (calendar_join_s1.nextval,'DEC','Decenber','01-DEC-09','31-DEC-09'); -- Seed the TRANSACTION table. INSERT INTO transaction_join VALUES (transaction_join_s1.nextval,38.99,'16-JAN-09'); INSERT INTO transaction_join VALUES (transaction_join_s1.nextval,32.87,'21-JAN-09'); INSERT INTO transaction_join VALUES (transaction_join_s1.nextval,9.99,'25-MAR-09'); INSERT INTO transaction_join VALUES (transaction_join_s1.nextval,43.19,'13-APR-09'); -- Commit the inserts. COMMIT; |
MySQL SQL Join Supporting Script →
This has the setup script for the example tables.
-- ------------------------------------------------------------------ -- This part seeds joins, except the range non-equijoin section. -- ------------------------------------------------------------------ -- Conditionally drop tables when they're found. SELECT 'DROP TABLE IF EXISTS ordinal_english, ordinal_french, ordinal_old_english' AS "Statement"; DROP TABLE IF EXISTS ordinal_english, ordinal_french, ordinal_old_english; -- Create tables. SELECT 'CREATE TABLE ordinal_english' AS "Statement"; CREATE TABLE ordinal_english ( ordinal_english_id INT UNSIGNED , ordinal_english_text VARCHAR(26)); SELECT 'CREATE TABLE ordinal_french' AS "Statement"; CREATE TABLE ordinal_french ( ordinal_french_id INT UNSIGNED , ordinal_french_text VARCHAR(26)); SELECT 'CREATE TABLE ordinal_old_english' AS "Statement"; CREATE TABLE ordinal_old_english ( ordinal_english_id INT UNSIGNED , ordinal_old_english_text VARCHAR(26)); -- Insert into three tables. SELECT 'INSERT INTO ordinal_english' AS "Statement"; INSERT INTO ordinal_english VALUES (1,'One'),(2,'Two'),(3,'Three'),(4,'Four'),(5,'Five'),(6,'Six'); SELECT 'INSERT INTO ordinal_french' AS "Statement"; INSERT INTO ordinal_french VALUES (1,'Un'),(3,'Trois'),(4,'Quatre'),(5,'Cinq'),(6,'Six'),(7,'Sept'),(8,'Huit'); SELECT 'INSERT INTO ordinal_old_english' AS "Statement"; INSERT INTO ordinal_old_english VALUES (1,'Fyrmest'),(3,'Pridda'),(4,'Feorda'),(5,'Fifta'),(6,'Siexta'),(7,'Scofoda'),(8,'Eahtoda'); -- Conditionally drop tables when they're found. SELECT 'DROP TABLE IF EXISTS system_user' AS "Statement"; DROP TABLE IF EXISTS system_user; -- Create SYSTEM_USER table and sequence and seed data. SELECT 'CREATE TABLE system_user' AS "Statement"; CREATE TABLE system_user ( system_user_id INT UNSIGNED PRIMARY KEY , system_user_name VARCHAR(20) NOT NULL , system_user_group_id INT UNSIGNED NOT NULL , system_user_type INT UNSIGNED NOT NULL , last_name VARCHAR(20) , first_name VARCHAR(20) , middle_initial VARCHAR(1) , created_by INT UNSIGNED NOT NULL , creation_date DATE NOT NULL , last_updated_by INT UNSIGNED NOT NULL , last_update_date DATE NOT NULL , CONSTRAINT fk_system_user1 FOREIGN KEY(created_by) REFERENCES system_user(system_user_id) , CONSTRAINT fk_system_user2 FOREIGN KEY(last_updated_by) REFERENCES system_user(system_user_id)); -- Seed self-referencing table. SELECT 'INSERT INTO system_user' AS "Statement"; INSERT INTO system_user VALUES ( 1,'SYSADMIN', 1, 1, '', '', '', 1, UTC_DATE(), 1, UTC_DATE()) ,( 2,'DBA', 2, 1,'Adams','Samuel', '', 1, UTC_DATE(), 2, UTC_DATE()) ,( 3,'DBA', 2, 1,'Henry','Patrick', '', 1, UTC_DATE(), 1, UTC_DATE()) ,( 4,'DBA', 2, 1,'Puri','Manmohan', '', 1, UTC_DATE(), 3, UTC_DATE()) ,( 5,'APPS DBA', 2, 1,'Bremen','Gunther', '', 4, UTC_DATE(), 4, UTC_DATE()); -- ------------------------------------------------------------------ -- This part seeds the range non-equijoin section. -- ------------------------------------------------------------------ -- Conditionally drop tables when they're found. SELECT 'DROP TABLE IF EXISTS calendar_join, transaction_join' AS "Statement"; DROP TABLE IF EXISTS calendar_join, transaction_join; -- Create tables and sequences. SELECT 'CREATE TABLE calendar_join' AS "Statement"; CREATE TABLE calendar_join ( calendar_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , month_short_name VARCHAR(3) NOT NULL , month_long_name VARCHAR(10) NOT NULL , start_date DATE NOT NULL , end_date DATE NOT NULL); SELECT 'CREATE TABLE transaction_join' AS "Statement"; CREATE TABLE transaction_join ( transaction_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , transaction_amount INT UNSIGNED NOT NULL , transaction_date DATE NOT NULL); -- Seed the CALENDAR table. SELECT 'INSERT INTO calendar_join' AS "Statement"; INSERT INTO calendar_join VALUES (NULL,'JAN','January','2009-01-01','2009-01-31') ,(NULL,'FEB','February','2009-02-01','2009-02-28') ,(NULL,'MAR','March','2009-03-01','2009-03-31') ,(NULL,'APR','April','2009-04-01','2009-04-30') ,(NULL,'MAY','May','2009-05-01','2009-05-31') ,(NULL,'JUN','June','2009-06-01','2009-06-30') ,(NULL,'JUL','July','2009-07-01','2009-07-31') ,(NULL,'AUG','August','2009-08-01','2009-08-31') ,(NULL,'SEP','September','2009-09-01','2009-09-30') ,(NULL,'OCT','October','2009-10-01','2009-10-31') ,(NULL,'NOV','November','2009-11-01','2009-11-30') ,(NULL,'DEC','Decenber','2009-12-01','2009-12-31'); -- Seed the TRANSACTION table. SELECT 'INSERT INTO transaction_join' AS "Statement"; INSERT INTO transaction_join VALUES (NULL,38.99,'2009-01-16') ,(NULL,32.87,'2009-01-21') ,(NULL,9.99,'2009-03-25') ,(NULL,43.19,'2009-04-13'); -- Commit the inserts. COMMIT; |