Microsoft recently released two new aggregation functions in Excel – GROUPBY and PIVOTBY. Both these functions are useful when you want to quickly summarize data in your Excel spreadsheet. In this article we’re going to look at how to use the GROUPBY function, and in the next article we’ll cover the more powerful PIVOTBY function.
NOTE : these functions are currently only available to users in the current (i.e. Preview) channel of the Microsoft 365 Insider program – keep an eye out for the formal release date to all users, which should be announced soon!
What is the GROUPBY function?
As the name implies, the GROUPBY function is used to group or aggregate a range of data. A typical use case is when you have a table of sales data that you want to quickly group by region and see the totals.
The concept behind the GROUPBY function is therefore quite simple, and most times you’ll only need to use it for basic groupings and aggregations. However, as we’ll see, it has some additional features that make it a powerful addition to your Excel toolbox.
BUT WAIT! I can just use a pivot table in Excel to achieve the same thing; why should I use this new GROUPBY function instead? Yes, you can use a pivot table to do exactly the same thing – we’ll look at the difference between the two approaches later in the article.
How to use the GROUPBY function
Let’s look at the syntax for the basic usage of the GROUPBY function:
GROUPBY(row_fields,values,function)
As you can, the function takes 3 arguments (there are some more optional):
- row_fields: the range that contains the values which are used to group the data
- values: the range that contains the values that you want to aggregate
- function: the type of aggregation that you want to perform on the values (eg sum, count, average)
Let’s look at our example again to see more clearly how the syntax works. Here is the formula:
=GROUPBY(Table3[[#All],[Region]],Table3[[#All],[Amount (USD)]],SUM)
Note the following:
- When you update a value in the source table, the results in the GROUPBY table are automatically updated – there is no need to perform a refresh as you would with a pivot table
- The source data has been formatted as an Excel table – this is good practice when handling data as you can specify columns by their names, which makes it easier to read formulas
- The formula has been entered in cell H2 in the spreadsheet, and when you press ‘Enter’ after writing the formula, the results spill down and across the sheet This behavior occurs because the GROUPBY function is a dynamic array formula.
Before we look at a worked example, let’s introduce the remaining optional arguments:
We’ll look at these extra optional arguments in the following examples.
A worked example using the GROUPBY function
For our worked example we’re going to look at a simple invoice capturing system set up in Excel that uses the SQL Spreads Add-In to write the data into a back-end SQL database.
The invoice capturing system includes a data entry table where admin staff can capture and update supplier invoices.
Here is the formula in cell K3:
=GROUPBY(Table1[[#All],[Cost Center]],Table1[[#All],[Amount]],SUM,3,,,Table1[[#All],[Status]]="Pending")
You can see that the formula is like the example that we looked at earlier, with the exception that we’ve added a filter to only show pending invoices.
Now, when users add new invoices or change the status of an existing record, the summary GROUPBY table will automatically update.
You’ll notice that the results returned by the GROUPBY function are not nicely formatted in the way that a pivot table is displayed. We can add some formatting manually, however.
Using GROUPBY to show sub-totals
Now let’s add a more detailed summary of the pending invoices. Here is the formula in cell K9
=GROUPBY(CHOOSECOLS(Table1[#All],6,8,4),Table1[[#All],[Amount]],SUM,3,2,,Table1[[#All],[Status]]="Pending")
You can see that instead of selecting a single column for the row_fields argument, I’ve used the CHOOSECOLS function, which means I can simply select the whole table as an array, and then specify the columns that I want to include as the row_fields in the GROUPBY function. This gives me the flexibility to quickly and easily change the rows to be used in the GROUPBY formula, as well as the order in which they’ll appear.
Also note that I specified a value for the total_depth argument. In this case I entered ‘2’, to indicate that I want to include 2 levels of grouping – a sub-total as well as a grand total.
To get the formatting to work for the results returned from this more detailed GROUPBY formula, I used Conditional Formatting, as the number of rows returned will vary depending on the number of invoices returned by the query. Here are the conditional formatting rules:
Note that as I add each invoice, the results in the GROUPBY table update automatically. Once I have finished adding invoices for the time being, I can click on Save to Database in the SQL Spreads ribbon tab, and the changes will be written back to the financial system database in SQL Server.
Using GROUPBY with a Tree Filter
Finally, let’s look at an example which uses the Tree Filter feature in SQL Spreads to filter results in conjunction with an aggregation table created using the GROUPBY function.
With this solution, which uses SQL Spreads to connect to SQL Server to view and manage customer invoices, users can
- filter the results that are displayed by selecting items in the tree filter on the left hand side
- view data in the CustomerInvoices table and make updates to the Status column, or add new invoices to the table
- view summary aggregated results to see the total pending amount owed by each Customer
- save any data changes back to the financial system database in SQL Server
GROUPBY vs Pivot Table
Those of you that use the pivot table feature in Excel will be saying “I can use a pivot table to do exactly the same thing – why use GROUPBY?”
The main advantage that GROUPBY has over normal pivot tables is that the results update dynamically as the source date changes. This happens because GROUPBY is simply a normal Excel formula. In a pivot table, on the other hand, if you make changes to the source data, you need to refresh the pivot table to see the changes. This automatic updating of the data in the GROUPBY is useful when you are working with source data and an aggregated view in the same sheet – as in the examples that we worked through earlier.
For those who use them, it is just a matter of clicks to create a pivot table to aggregate some source data. But, if you are not familiar with pivot tables, the process of inserting one into a spreadsheet may not be very intuitive. Excel users that are familiar with formulas, but not necessarily pivot tables will likely find the GROUPBY function straightforward to use.
The GROUPBY function (and it’s slightly more advanced cousin, PIVOTBY) won’t replace the use of pivot tables in Excel (which you’ll definitely still need to use), but they offer a nice way of creating quick and simple summaries, In conjunction with other functions, you can create powerful formulas that go beyond the capabilities of standard pivot tables.
We’ll look at the PIVOTBY function as well as some more advanced usage of GROUPBY in some upcoming articles. In the meantime, check out the SQL Spreads Add-In for Excel to see how it can empower your team to update SQL Server data directly from Excel.