Subqueries
Subqueries are the bread and butter of making SQL effective and powerful as a programming language. They enable you to find lookup values, like primary keys. They let you validate whether values exist or don’t exist, and thereby serve to filter the result set of the master or outer query.
Subqueries are queries that run inside other queries or statements, like data manipulation statements – INSERT
, UPDATE
, and DELETE
. The query or statement that contains a subquery is also known as an outer query, and subqueries are inner queries.
You can nest subqueries within subqueries. I’ve never found the upward limit (in practice or the documentation) but there may be one. The logic of many nested subqueries can be daunting. I’d say three or four is probably a limit that should lead you to consider alternative solutions. Some developers overtly rely on subqueries to avoid complex joins semantics.
Subquery Definitions
Summary
There are three basic types of subqueries. The simplest and sometimes known as ordinary subqueries are those that return results independent of the outer query. A correlated query runs for each row of the outer query. Correlated queries act as audits of the existence or non-existence of a row based on a join between the outer and inner query. Last, there is an inline view. Inline views are queries that produce filtered result sets, and they function as run-time views.
Ordinary subqueries come in a general form as a multiple row subqueries. A single row subquery is specialized form of a multiple row subquery because it must resolve to a single row. The most specialized ordinary subquery is a SQL expression, which is a single row subquery that returns only one column. The SQL expression is more commonly referred to as a scalar subquery.
Correlated subqueries let you validate whether values exist or don’t exist, and thereby serve to filter the result set of the master or outer query. They also let you perform complex associations to conditionally update or delete multiple rows in other tables. You should have unique indexes on these join columns because correlated subqueries run for each row returned by the outer query. Unique indexes let correlated queries perform faster with fewer computing resources.
Inline views let you create filtered data sets that aren’t defined as permanent catalog views. Inline views, like catalog views, become virtual tables that let you join their contents to other tables and catalog views.
Dictionary ↓
Word | Definition |
Base table |
A base table is any table qualified in the FROM of a query. The term implies that the data comes from a base location, which is the table. |
Catalog | A catalog is the data dictionary or metadata of a database management system. |
Catalog view | A catalog view is a query assigned a name and saved inside a database catalog. Most views join data from one or more tables to provide broader perspective of data. Catalog views don’t copy data from the sources, they only facilitate querying a filtered data set through a catalog name (view name). Many catalog views are updateable, which means you can read data from them and update it to their native tables where data are stored. Views that aren’t updateable contain aggregation expressions, set operators, or sorting operations. |
Correlated subquery | A correlated subquery is a subquery that contains a join to an outer query. The join may be to the immediate outer query or the outermost query. |
Data dictionary | A catalog is the data dictionary or metadata of a database management system. |
Inline view | A query that becomes a run time filtered result set or view of data. You can join an inline view to other tables, catalog views, or other inline views, provided they are in the scope of operation. |
Intermediate node query |
An intermediate node query is a query found in a SELECT or DML statement that contains one or more subqueries. |
Leaf node query | A subquery of another query or statement that contains no other subquery. |
Metadata | Metadata is data about data. Metadata is organized as a taxonomy to support structures that implement hierarchical, networked, relational, or object relational database. Metadata is stored in the data dictionary or catalog of a database. The patterns are similar but different in various implementations of database management systems. The database management system creates databases by building the data catalog first. |
Named view |
A named view is a view that is a catalog view, which means it was defined by a CREATE VIEW statement. Named views contain a query that marshals data from one or more base tables. |
Non-updateable view |
A non-updateable view is a view that: (1) contains a collection, set operator, subquery in a SELECT list, any aggregation syntax, or any analytic syntax (e.g., MODEL ; or (2) excludes a preserved key for any base table. |
Root node query |
A root node query is a SELECT statement or DML statement that contains one or more subqueries. |
Run time view |
A query that returns a filtered result set or view of data that exists in the FROM clause of a SELECT statement. A run time view is also known as an inline view. |
Select list |
A select list is the collection of columns and string or numeric literals returned in the SELECT clause of a query. |
Subquery | A subquery runs independently of the containing query. Subqueries are also known as ordinary subqueries to differentiate them from correlated subqueries. A subquery without any dependent subqueries is known as a leaf node. Subqueries always return a value, list of values, or table of values to their respective outer query. |
Updateable view |
An updateable view is a view that: (1) provides one key preserved table, which means a set of columns that includes the primary key of a base table; (2) supports a single DML (an INSERT , UPDATE , or DELETE ) statement that only affects a single base table; and (3) doesn’t contain a collection, set operator, subquery in a SELECT list, any aggregation syntax, or any analytic syntax (e.g., MODEL . Oracle also disallows an updateable view when it includes a WITH READ ONLY subquery, WITH CHECK OPTION , or any hierarchical join syntax. Updateable views are very dependent on the database management systems, and you should pay close attention to the supporting documentation. |
View | A view is a structure that is stored in the catalog view and contain a query that marshals data from one or more base tables. |
In short, subqueries fall into three categories – inside out, outside in, and bi-directional. Ordinary subqueries, or subqueries, are inside out, which means they run before the outer query and return the results to the outer query. Correlated subqueries are outside in because they run for each row returned by the outer query. Correlated subqueries match one or more outer query row’s column values against any equal number of the innner query row’s column values. Inline views are bi-directional, which means the way they’re joined is determined by the SQL cost optimizer. SQL cost optimizers are product specific to database management systems, and you can refer to this SQL join semantic blog post for more information.
The drawing illustrates the relationships between queries and subqueries in the various scenarios. More or less subqueries return a value, a row of values, or a table of values, while a correlated subquery returns a Boolean acknowledgment of existence or non-existence. Inline views create filtered result sets that substitutes for a table or catalog view in a relational join pattern.
The next sections examine the specifics of the subqueries. You can find scalar subqueries in an earlier post.
Single row subquery ↓
A single row subquery returns all columns for a single row. You can use a single row subquery as the select list of arguments for an INSERT
statement. You can also use a single row subquery in the WHERE
clause of a SELECT
, UPDATE
, and DELETE
statement. Like the scalar subquery, you use an equality, =
, comparison operator, but unlike a scalar subquery, you compare record structures not singular column values.
Rules ↓
A single row subquery returns a set of variables, or a row of data. The values are comma delimited and generally they’re scalar variables. Scalar variables are number, date, or string data types. Scalar variables are also known as primitives in some languages, like Java. Single row subqueries typically uses a SQL statement to look for a collection of things in the database. Single row subqueries rely on business rules because they are more human friendly than matching keys. You use an equality, =
, comparison operator to ensure only one row is returned. The equality comparison operator throws an ORA-01427
error when a subquery returns more than one row. This type of structure comparison lets you compare two comma separated lists of columns or values that are enclosed by parentheses.
Description
A single row subquery is a close relative in SQL to a pass-by-value function. Unlike a scalar subquery that returns a single value, a single row subquery returns a structure. The record structure of a single row subquery is a comma delimited list of columns or values. Pass-by-value functions are black boxes. Black boxes let you put some raw material in, process it, and get some related series of processed products out. The difference between a scalar and single row subquery is the return data type.
A single row subquery is independent of its outer query. Consider that a query with subqueries is like an inverted tree. The outermost query is the root node of the inverted tree. Any subqueries of the root node are branches, and subqueries of the first generation of subqueries are branches from branches. The lowest subquery in any sequence of branches is also known as a leaf node. Leaf nodes resolve first, followed by their siblings. After leaf nodes, the branch resolves that is their parent, the parent of the parent and so forth until all values are passed to the outermost subquery. If there’s another branch with leaf nodes, the behavior is repeated until all branches resolve with values to the outermost query. At this point the outermost query runs. This is why scalar subqueries are considered inside out.
The following demonstrates a single row subquery. The outer query depends on the result returned by the subquery. The subquery uses a first name to filter the result set, and hopefully returns a single row of data. Naturally, this is just a sample program and the question could be resolved with a simple WHERE
clause and doesn’t require a subquery.
SELECT c.first_name||' '||c.last_name AS full_name FROM contact c WHERE (c.first_name, c.last_name) = (SELECT ic.first_name, ic.last_name -- Single row subquery. FROM contact ic WHERE ic.first_name = 'Randi'); |
If the subquery returns two rows, the query would fail with an ORA-01427
error. The error means that a single row subquery returns more than one row. This type of query architecture is risky because it is data centric. The occurrence of two first names that are alike would cause this subquery architecture to fail. It relies on the subquery returning a row of data that is only found once in the result set of a subquery. A multiple row subquery is typically a better solution when the number of row returns is uncertain because it can inspect multiple row returns with an IN
or other multiple row comparison operator.
In some implementations, like Oracle, you can also return a User-Defined Type (UDT) as part of the record structure. UDTs are typically objects, arrays, or lists, and they are specific to object-relational technology – more or less an Oracle database.
INSERT
statement ↓
An INSERT
statement can use a single row subquery when it provides all required values for a single row insertion. A subquery in an INSERT
statement can be a subquery, nested subquery, or an inline view. Since nested subqueries and inline views are discussed elsewhere, this section focuses on single row subqueries in INSERT
statements.
You eliminate the VALUES
clause when you use a subquery in an INSERT
statement. Parentheses around the subquery are optional in Oracle and precluded in MySQL. While a single row subquery works inside an INSERT
statement, so do multiple row subqueries. There is no real difference in the syntax between the two, and a multiple row subquery raises no exception. A multiple row subquery merely inserts more than one row.
Like the scalar query, a single row subquery lets you look for something in the database to insert into another table. The single row subquery relies on business rules to find the record set for the insertion. Business rules are typically more human friendly than mapping surrogate keys between tables. They also let you develop effective queries to find data.
The following INSERT
statement demonstrates a single row subquery that provides the required columns for a row insert into the item
table.
INSERT INTO item ( SELECT item_s1.nextval -- Single row subquery. , 'ASIN: B0001US8F8' , (SELECT common_lookup_id -- Leaf node scalar subquery. FROM common_lookup WHERE common_lookup_type = 'DVD_WIDE_SCREEN') , 'Around the World in 80 Days' , 'Two-Disc Special Edition' , 'NR' , 'MPAA' , '18-MAY-2004' , 3 , SYSDATE , 3 , SYSDATE FROM dual); |
The single row subquery above creates a row of data that also includes a nested scalar subquery or SQL expression. The SQL expression looks up the surrogate key based on the human friendly definition of DVD_WIDE_SCREEN
, and returns a foreign key value. The returned values becomes part of the record structure of the root query. You should note that the list of return columns in the SELECT
clause includes only string and numeric literals along with the SQL expression. While this could just as easily be written with a VALUES
clause, it illustrates the use of a subquery in an INSERT
statement.
As mentioned, the scalar subquery runs first as the leaf node, and it’s value is added to the list of values in the outer query. The outer query result set is then passed to the INSERT
statement for processing.
In some implementations, like Oracle, you can also return a User-Defined Type (UDT) as part of the record structure. UDTs are typically objects, arrays, or lists, and they are specific to object-relational technology – more or less an Oracle database.
UPDATE
statement ↓
An UPDATE
statement can use a single row subquery in the SET
or WHERE
clauses. Like the scalar subquery, it uses SQL statements to look for a set of related things in the database based. Single row subqueries typically use business rules to find the filtered result set. Like their name implies, single row subqueries return a single row of data, which is also known as a structure. A subquery makes finding a row of data easier because business rules are more human friendly.
The following demonstrates a single row subquery in the context of an UPDATE
statement.
UPDATE contact c SET (c.first_name,c.last_name) = (SELECT 'Lori','Winn' FROM dual) WHERE (c.first_name,c.last_name) = (SELECT 'Randi','Winn' FROM dual); |
There are two single row subqueries in the sample UPDATE
statement. One supports the assignment of new values in the SET
clause, while the other filters the data in the WHERE
clause. Both subqueries work with assigning or comparing a set of two column values.
You could rewrite the single row subquery statement with scalar subqueries, but it means each single row subquery activity becomes two scalar subqueries. The next example shows how you could implement scalar subqueries to do it. Remember, it’s only an example of what you shouldn’t do. It sole purpose is to show you the power of single-row subqueries.
UPDATE contact c SET c.first_name = (SELECT 'Lori' FROM dual) , c.last_name = (SELECT 'Winn' FROM dual) WHERE c.first_name = (SELECT 'Randi' FROM dual) AND c.last_name = (SELECT 'Winn' FROM dual); |
The advantage of comparing comma separated lists is a tremendous feature of single row subqueries. It is a technique often not fully understood, and seldom exploited frequently enough in real world SQL code.
In some implementations, like Oracle, you can also return a User-Defined Type (UDT) as part of the record structure. UDTs are typically objects, arrays, or lists, and they are specific to object-relational technology – more or less an Oracle database.
DELETE
statement ↓
A DELETE
statement can only use a single row subquery in the WHERE
clause. DELETE
statements use single-row subqueries SQL statements to look up a related row of things. The returned row becomes a structure that you can compare against a comma delimited set of columns in the WHERE
clause. A single-row subquery lets you use business rules to find data sets rather than manually inspecting the data.
You typically use a single row subquery when you lookup a record structure, as a set of column values from another table or collections of tables in a filtered join. The lookup follows a business rule that can’t easily be done by using manual inspection of the data. Below is an example of a DELETE
statement that compares a record structure returned by a subquery.
DELETE FROM contact c WHERE (c.first_name, c.last_name) = (SELECT 'Randi','Winn' FROM dual); |
The preceding example uses a single-row subquery in an DELETE
statement follows below. The subquery fabricates a row for one that may not exist in a database table.
Like the single-row subquery in the foregoing UPDATE
statement section, you can rewrite this row selection into two scalar subqueries. As mentioned in that section, there’s no purpose in doing so except for demonstrating the concept. A single-row subquery is a better relative solution between the two.
DELETE FROM contact c WHERE c.first_name = (SELECT 'Randi' FROM dual) AND c.last_name = (SELECT 'Winn' FROM dual); |
Both scalar subqueries must complete before the comparisons can be made in the WHERE
clause. If both single comparison conditions are met, the query returns qualified rows. It is the equivalent of writing two scalar subqueries.
In some implementations, like Oracle, you can also return a User-Defined Type (UDT) as part of the record structure. UDTs are typically objects, arrays, or lists, and they are specific to object-relational technology – more or less an Oracle database.
I think I found a “dyslexic” word in the def. for CATALOG VIEW: “sotring” rather than “sorting”. Also, a misspell for the METADATA def.: where you have ” . . . by building the data catalof first.” rather than ” . . . by building the data ‘catalog’ first.” Both of these are at the end of the definitions.
Jade Rigby
5 Feb 11 at 11:02 am
Thanks, fixed.
michaelmclaughlin
16 Mar 11 at 2:13 pm
“Last, there is an inline view. Inline views are queries that produce filtered result sets, and they as run time tables.”
I believe in this reference above in the first paragraph under ‘subquery definitions’ in regards to the inline view, that in the second line, You mean to say “…and they function as run-time tables.”
student / student
18 Oct 17 at 10:20 am
Yes, that’s a great catch but I choose run-time views because they have no stored structure.
michaelmclaughlin
23 Dec 17 at 2:57 pm