With Excel Power Query you can connect to data sources, import data, clean and shape it and then load it into Excel or a data model so that you can create charts and reports. In this article, we’re going to look at some examples of where you’d use Power Query and give a quick overview of how to use it.
What is Power Query? An Overview
Power Query first appeared as a free add-in for Excel 2010 – this added a Power Query tab to the ribbon. In recent versions of Excel, Power Query has been hidden away somewhat. You now access the Power Query wizards and tools from the Get & Transform Data group on the Data tab of the Excel ribbon. This makes sense, as getting and transforming data is what Power Query is all about.
In almost all cases, you’ll use the Power Query tools to perform the normal ETL functions:
- Extract: connect and get data from a source
- Transform: clean or modify the data so that it meets your analysis or reporting requirements
- Load: bring the transformed data into Excel or load it into an in-memory data model
The main focus for Power Query is the transform functions that it offers. These functions will be familiar to Excel users, as they do things that Excel users do on a day-to-day basis, like removing a column or filtering rows, but there are also advanced options such as merge, append, group by, pivot, and unpivot.
In addition to offering a range of functions available for transforming your source data, Power Query has a neat trick up its sleeve: you can save the transformation steps that you apply and re-run them whenever you need to.
Why Use Power Query?
Power Query is all about getting and transforming data. These are crucial steps, and no data-related endeavor can be completed without going through what is often a tedious process. Based on a recent survey, data scientists spend close to half of their time getting data ready (i.e. loading and cleansing) before they can use it to develop models and visualizations.
Some of the typical use cases for where data preparation is needed are:
- You need to produce a monthly sales report which aggregates sales data in the form of CSV files from multiple regions.
- You receive a file from a third-party system containing statistical data with attributes as columns (e.g. months of the year) – in order to process the data, you need to rotate the columns to rows, or unpivot it, first.
- You need to produce a weekly report which uses data from multiple locations and where the data is inconsistent and ‘dirty’, requiring lots of cleaning up.
- You want to produce a report which is based on a simple table in SQL Server, but you want to augment the basic data with additional calculated columns.
- You need to analyze a very large dataset that is too big to use in Excel
So, it’s clear that in most cases you need to prepare your data before you can use it, but why use Power Query in particular? The main reason is that the Power Query tools and wizards are embedded into everyone’s favorite business application, Excel. Even if you don’t intend to use Excel for your data analysis, it’s often convenient to use Power Query as a quick ‘first look’ at your data, before moving on to other data analysis tools.
The bottom line is, that if you use Excel and/or Power BI for your data analysis or reporting, or if you just need to quickly take a look at a data source, you’re going to want to use the Power Query tools to some extent.
Getting Started with Power Query: Step by Step Guide for Beginners
We’re going to look at a simple example that shows the basics of Power Query.
First, let’s take a look at the sample file – it simply contains sales and expense transactions from an online accounting system; we’re going to clean it up so that we can use it in some other reporting system.
Step (1): Extract – Get the data
In Excel, click on Data > Get Data > From File > From Text/CSV, and then select the file you want to import – in our example it’s a file called ‘Region A export.csv’.
Excel will now connect to the data source (CSV file in our example) and present a preview of the data. From here you can either load the data directly into Excel (or a data model) or transform the data first. We need to do some cleaning up and shaping of the data, so need to click on the ‘Transform’ button.
(2) Transform – clean and shape the data
When you click on the ‘Transform Data’ button, the Power Query Editor is launched. This is where you’ll work on an in-memory copy of the data you are importing.
The Power Query Editor has 4 main areas:
- Ribbon: the various transformation and clean-up tasks are grouped into different tab headings
- Queries pane: this is a view of all your available queries
- Current view: this is the main working view, that displays a preview of the data for your query
- Query settings: this shows the query steps for the currently selected query
Working in the Power Query editor is a bit like recording a macro in Excel – as you complete the transformation tasks, they are recorded step-by-step in the Query settings view. If you make mistakes, you can simply remove a step, or re-order them.
Here are the transformation steps we’re going to carry out for our example:
- Format the following columns correctly
– Date
– Amount - Remove some columns that we don’t need
- Perform a lookup to categorize the payment selection type
Format the date column
Dates can be tricky when you are handling them in data sets. Problems typically arise because the application into which you are importing the dates (e.g. Excel) often sees them as text fields, and also because of locale mismatches between the source data and the target.
We can see in our example that during the load of our CSV data, Power Query has decided that the ‘Date’ column is text.
To fix this, it’s normally simply a case of changing the Data Type in the drop-down to be Date, but in this case, we get some errors:
The errors are because the dates in the input file are in DD/MM/YYYY format, and Excel is expecting MM/DD/YYYY format due to its US locale settings. To fix this, right-click on the Date column heading and click on Change Types > Using Locale.
In the ‘Change Type with Locale’ dialog, select Date and English (United Kingdom).
Now is a good time to point out the query steps that have been recorded so. You can see the steps in the Query Settings view. There are 3 so far. The first two steps were created automatically by Power Query, the one labeled ‘Source’; is simply the details of the CSV import (the file location, delimiter, and encoding settings); the second is a default step that sets the first row in the file or table to be the column headers.
The third step is the one we just created, ‘Change Type with Locale’. You can see the function behind that step in the formula bar. The function is written in the Power Query M Formula Language. We’ll cover more on this in a future article.
Format the Amount column
The Amount column was imported as text. We need to change it to Currency (or decimal) by simply selecting the column header and changing the selection in the Data Type dropdown.
Remove columns
There are 3 columns in the CSV file that we don’t need in our final table, so we can remove them. Select each column to be removed (Ctrl-click to select multiple columns) and then click on Remove Columns > Remove Columns.
Lookup a Category column
In Excel, we’ll often use VLOOKUP (or the new XLOOKUP function) to retrieve a value from another table that helps add information or context to our data. The classic example is when there is an ID value in a table, and we need to look up the corresponding label or description from another table.
There are a couple of ways to do this in Power Query. The one described here uses the merge transformation function.
First, we need to create a query for the data in our lookup table. The lookup table is in an Excel workbook and looks like this:
In our original workbook, close the Power Query Editor by clicking on Close & Load > Close & Load to… and then select the ‘Connection Only’ option. We are selecting this option because we don’t want to load the data into Excel just yet.
Click on Get Data > From File > From Excel Workbook and select the Excel file that contains the lookup table. In the Navigator dialog, select the table, not the sheet that contains the data. Once again, we’re going to choose the Transform option in the dialog, instead of the Load option so that we can do the merge in Power Query.
A new query will now be created in the Power Query editor for the lookup data. We can now perform the merge step.
Select the first query that we created (this is the source end of the lookup, or left side of our join), and then click on Merge Queries > Merge Queries.
In the Merge dialog we now need to select the matching columns to be used in the lookup – in this case, it is the ‘Selection’ column.
When you click on ‘Ok’, the merge is completed, and the contents of the merged table are added as an extra column with a data type of ‘Table’. When you click on a cell in this column, the values in the merged table (i.e. the lookup table) for the selected row are displayed at the bottom of the preview table.
We now need to select which column from the merged table we want to display in the source table. To do this, click on the column expand button and select the columns to display – in this case, we want to see the Category Type.
Step (3): Load – load the data into Excel
Now that we’ve finished transforming the data, we can either load it into Excel and work with it there directly, or load it into a data model and create some tables and charts using Power Pivot. In this example, we’re just going to load into Excel.
In the Power Query Editor, click on Home > Close & Load > Close & Load. This will load the transformed data into a table in Excel. Note that we could do the same by clicking the Close & Load To option, and then selecting the relevant options in the Import Data dialog.
Here is the data loaded into the Excel table:
Step (4): Re-run the Query
Let’s say that we need to perform this extraction, transform and load every month. We can simply export the latest CSV from the online accounting system and store it in the same file location as before and refresh the query in Excel. This will re-run all the steps that we saved when we built the first transformation.
Summary
In this article, we’ve shown an example of how to use Power Query to connect to a data source, perform some simple clean up and then load the results into Excel. There is much more to Power Query than this, however, and we’ll dive a little deeper in a later article to look at some different data sources like SQL Server and some more advanced transformation steps.
If you just need a simple way to import data from SQL Server into Excel, then check out the SQL Spreads Add-In for Excel. SQL Spreads also allows users to make updates to your SQL tables directly from Excel, making it quicker and easier for you to manage your SQL data.