This is the sixth in a series of articles aimed at introducing the basics of SQL to Excel users. This article shows how to combine data from one or more data sets using the SQL UNION operator. UNION is one of a number of ‘set operators’ in SQL that are used to join the results of two (or more) SELECT statements. If you’re interested in the other articles, check them out here:
- Introduction to SQL for Excel Users
- What is the equivalent to VLOOKUP in SQL?
- How to create Pivot Tables in SQL Server
- Simple Data Validation in SQL
- What is the equivalent of the IF THEN function in SQL?
There’ll be more articles in this series, so keep an eye on the blog in the coming weeks!
What are Set operators in SQL?
Set operators are used to join the results of two (or more) SELECT statements. They are basically keywords that tell SQL how to merge the results from the different SELECT statements. There are 4 set operators that can be used to combine data – each has a different focus:
Operator | Description |
---|---|
UNION | Combines the results of two or more queries into a distinct single result, with any duplicates being removed |
UNION ALL | Combines the results of two or more queries into a distinct single result, with any duplicates being retained |
INTERSECT | Keeps the results that are common to all queries |
EXCEPT | Returns the results that are in the first query and not in the second |
The basic UNION statement is the most commonly used, so we’ll look at this first.
How to use the UNION operator
The UNION operator is used to combine data from two or more queries. Here is a simple example that shows how it works.
SELECT A.ID,
A.Name
FROM [UnionDemo].[dbo].[GroupA] AS A
UNION
SELECT B.ID,
B.Name
FROM [UnionDemo].[dbo].[GroupB] AS B
Some things to note about the UNION operator:
- the number and the order of the columns must be the same in all queries
- the data types must be compatible
- any duplicates that may exist in the two tables are excluded
- duplicates occur when records are the same across ALL the columns (e.g. in the above, if the name in Group B for ID 1005 had been Clare1, it would have been included in the result set)
If we do want to include any duplicates in the records that are returned by the select statements then we can use the UNION ALL operator:
SELECT A.ID,
A.Name
FROM [UnionDemo].[dbo].[GroupA] AS A
UNION ALL
SELECT B.ID,
B.Name
FROM [UnionDemo].[dbo].[GroupB] AS B
As you can see, the UNION operator is a simple, but powerful addition to your SQL skill set. The most common use cases for needing it are when you have multiple tables of the ‘same’ data – e.g. sales data from different regions. As long as the basic rules are adhered to, then the UNION statement is a good option.
When would you need to use UNION ALL?
As we’ve seen in the example above, UNION ALL will retain any duplicates in the result data set. What is a use case for this behavior? There are not many times when we can accommodate duplicates, although I’m sure there are some situations when the presence of duplicates doesn’t affect the analysis.
The main reason that you would use UNION ALL instead of UNION is performance-related. UNION ALL is much quicker than UNION as it does not have to check for duplicates, so it should be used when you know that there are no duplicates in the source tables (for example, sales data from region A can’t appear in the table for region B).
How to use the INTERSECT and EXCEPT operators
The INTERSECT operator
As the name implies, the INTERSECT operator will combine the results of two queries and return only rows that appear in both result sets. This is a useful way of finding duplicates in tables.
SELECT A.ID,
A.Name
FROM [UnionDemo].[dbo].[GroupA] AS A
INTERSECT
SELECT B.ID,
B.Name
FROM [UnionDemo].[dbo].[GroupB] AS B
The EXCEPT operator
The EXCEPT operator will return records from a select statement that don’t appear in a second select statement. The syntax is exactly the same as our UNION and INTERSECT examples, with the EXCEPT operator simply used instead.
SELECT A.ID,
A.Name
FROM [UnionDemo].[dbo].[GroupA] AS A
EXCEPT
SELECT B.ID,
B.Name
FROM [UnionDemo].[dbo].[GroupB] AS B
An example use case for the EXCEPT operator is when you want to find out which customers have no related sales recorded for them in a sales order table. In the tables below, you can see that there are no sales in the orders table for Clare.
The query to return the person with no orders recorded (i.e. Clare) is:
SELECT
A.ID, A.Name, O.Date, O.Amount
FROM [UnionDemo].[dbo].[GroupA] AS A
LEFT OUTER JOIN [UnionDemo].[dbo].[Orders] AS O on A.ID = O.ID
EXCEPT
SELECT
A.ID, A.Name, O.Date, O.Amount
FROM [UnionDemo].[dbo].[GroupA] AS A
RIGHT OUTER JOIN [UnionDemo].[dbo].[Orders] AS O on A.ID = O.ID
UNION vs JOIN
Both UNION and JOIN combine data from different tables. There is, however, a fundamental difference between the two.
The key difference is that in the JOIN statement, we are combining COLUMNS from different tables (based on a related column), whereas with UNION we are combining ROWS from tables with identical columns.
Summary
The SQL UNION operator is used to combine the results of two or more queries into a distinct single result set. A typical use case for this is when we have data split up across multiple tables, and we want to merge it into one single set of results.
The INTERSECT and EXCEPT operators are other types of set operators which allow us to find duplicates in the results returned by two queries (INTERSECT) or results from one query that don’t appear in a second (EXCEPT).
If your organization uses both SQL Server and Excel, then check out the SQL Spreads Add-In for Excel; it will greatly simplify how you manage your data environment.