• SQL Spreads Blog
  • SQL Union: How to Combine Result Sets

    Andy McDonald - Sep 02, 2022

    This is the sixth in a series of articles aimed at introducing the basics of SQL to Excel users. This article shows how to combine data from one or more data sets using the SQL UNION operator. UNION is one of a number of ‘set operators’ in SQL that are used to join the results of two (or more) SELECT statements. If you’re interested in the other articles, check them out here:

    There’ll be more articles in this series, so keep an eye on the blog in the coming weeks!

    What are Set operators in SQL?

    Set operators are used to join the results of two (or more) SELECT statements. They are basically keywords that tell SQL how to merge the results from the different SELECT statements. There are 4 set operators that can be used to combine data – each has a different focus:

    Operator Description
    UNION Combines the results of two or more queries into a distinct single result, with any duplicates being removed
    UNION ALL Combines the results of two or more queries into a distinct single result, with any duplicates being retained
    INTERSECT Keeps the results that are common to all queries
    EXCEPT Returns the results that are in the first query and not in the second

    The basic UNION statement is the most commonly used, so we’ll look at this first.

    How to use the UNION operator

    The UNION operator is used to combine data from two or more queries. Here is a simple example that shows how it works.

    SELECT A.ID,
    A.Name
    FROM [UnionDemo].[dbo].[GroupA] AS A
    UNION
    SELECT B.ID,
    B.Name
    FROM [UnionDemo].[dbo].[GroupB] AS B

    SQL UNION basic

    Some things to note about the UNION operator:

    • the number and the order of the columns must be the same in all queries
    • the data types must be compatible
    • any duplicates that may exist in the two tables are excluded
    • duplicates occur when records are the same across ALL the columns (e.g. in the above, if the name in Group B for ID 1005 had been Clare1, it would have been included in the result set)

    If we do want to include any duplicates in the records that are returned by the select statements then we can use the UNION ALL operator:

    SELECT A.ID,
    A.Name
    FROM [UnionDemo].[dbo].[GroupA] AS A
    UNION ALL
    SELECT B.ID,
    B.Name
    FROM [UnionDemo].[dbo].[GroupB] AS B

    SQL UNION ALL

    As you can see, the UNION operator is a simple, but powerful addition to your SQL skill set. The most common use cases for needing it are when you have multiple tables of the ‘same’ data – e.g. sales data from different regions. As long as the basic rules are adhered to, then the UNION statement is a good option.

    When would you need to use UNION ALL?

    As we’ve seen in the example above, UNION ALL will retain any duplicates in the result data set. What is a use case for this behavior? There are not many times when we can accommodate duplicates, although I’m sure there are some situations when the presence of duplicates doesn’t affect the analysis.

    The main reason that you would use UNION ALL instead of UNION is performance-related. UNION ALL is much quicker than UNION as it does not have to check for duplicates, so it should be used when you know that there are no duplicates in the source tables (for example, sales data from region A can’t appear in the table for region B).

    How to use the INTERSECT and EXCEPT operators

    The INTERSECT operator

    As the name implies, the INTERSECT operator will combine the results of two queries and return only rows that appear in both result sets. This is a useful way of finding duplicates in tables.

    SELECT A.ID,
    A.Name
    FROM [UnionDemo].[dbo].[GroupA] AS A
    INTERSECT
    SELECT B.ID,
    B.Name
    FROM [UnionDemo].[dbo].[GroupB] AS B

    SQL INTERSECT

    The EXCEPT operator

    The EXCEPT operator will return records from a select statement that don’t appear in a second select statement. The syntax is exactly the same as our UNION and INTERSECT examples, with the EXCEPT operator simply used instead.

    SELECT A.ID,
    A.Name
    FROM [UnionDemo].[dbo].[GroupA] AS A
    EXCEPT
    SELECT B.ID,
    B.Name
    FROM [UnionDemo].[dbo].[GroupB] AS B

    SQL EXCEPT

    An example use case for the EXCEPT operator is when you want to find out which customers have no related sales recorded for them in a sales order table. In the tables below, you can see that there are no sales in the orders table for Clare.

    SQL EXCEPT example

    The query to return the person with no orders recorded (i.e. Clare) is:

    SELECT
    A.ID, A.Name, O.Date, O.Amount
    FROM [UnionDemo].[dbo].[GroupA] AS A
    LEFT OUTER JOIN [UnionDemo].[dbo].[Orders] AS O on A.ID = O.ID
    EXCEPT
    SELECT
    A.ID, A.Name, O.Date, O.Amount
    FROM [UnionDemo].[dbo].[GroupA] AS A
    RIGHT OUTER JOIN [UnionDemo].[dbo].[Orders] AS O on A.ID = O.ID

    SQL EXCEPT example result

    UNION vs JOIN

    Both UNION and JOIN combine data from different tables. There is, however, a fundamental difference between the two.

    The key difference is that in the JOIN statement, we are combining COLUMNS from different tables (based on a related column), whereas with UNION we are combining ROWS from tables with identical columns.

    Summary

    The SQL UNION operator is used to combine the results of two or more queries into a distinct single result set. A typical use case for this is when we have data split up across multiple tables, and we want to merge it into one single set of results.

    The INTERSECT and EXCEPT operators are other types of set operators which allow us to find duplicates in the results returned by two queries (INTERSECT) or results from one query that don’t appear in a second (EXCEPT).

    If your organization uses both SQL Server and Excel, then check out the SQL Spreads Add-In for Excel; it will greatly simplify how you manage your data environment. 

    Leave a Reply


    No comments yet. Be the first!

Use Excel to update your data in SQL Server?

Try the SQL Spreads data management solution to update and manage your SQL Server data from within Excel.