Excel: Clean Data with Get & TransformBy
One of the more powerful but overlooked data-cleansing tools in Excel can be found in the Get & Transform Data group on the Data tab in Excel. Each of these icons leads to the Power Query Editor, a data cleansing tool used in Power BI, Power Automate, and Excel.
Any cleansing steps that you perform with the Power Query Editor are saved as a procedure that can be run again the next time that you have similar data. For example, you might download a report each month that requires a half dozen cleaning steps. If you do those steps with this month’s file and save the list of steps in a workbook, you can easily run those same steps next month.
The Power Query tools weren’t developed by the Excel team. They were developed by a SQL Server team at Microsoft. When you first use the Power Query tools, it might feel like you’re in an alternate universe. Instead of “Text to Columns,” the equivalent command in the Power Query Editor is called “Split Column.” But if you open the Split Column drop-down in the Power Query Editor, you’ll see many extra options that are better than those in Excel.
This month’s article highlights some of the extra features that are in the Power Query Editor but not easy to do in Excel.
“LOADING” YOUR DATA
All transformations that happen in Power Query are nondestructive. The original data in the Excel worksheet remains unchanged. The “changes” that you make using Power Query are saved as a set of instructions. When you finish cleaning data in Power Query, the instructions are run in memory, and the output is a new worksheet with the cleaned data.
To get started, format your data as a table either using Ctrl+T or going to Home, Format as Table. Alternatively, you can use a named range for your data. Power Query only works with either a table or a named range.
On the Data tab, look for the Get & Transform Data group. In older versions of Excel, choose From Table/Range. In the latest version of Microsoft 365, this command was recently renamed as From Sheet (see Figure 1). Select one cell in your table (or the entire range if you’re using a named range) and select From Sheet on the Data tab.
The first 1,000 rows of your data are shown in the Power Query Editor. As shown in Figure 2, the Power Query Editor has a variety of tools on a ribbon, with tabs for Home, Transform, Add Column, and View.
POWER QUERY TOOLS AND FEATURES
Here are a few of the tools and features in Power Query that are better than their counterparts in Excel.
Fill Down: If you have a column that’s in outline form—for example, “East Region” is in cell A2 followed by blank cells in that column until the start of “Central Region” in row 200, followed by more blanks until “West Region” in row 500—you can quickly fill in the blanks using Fill Down.
Remove Rows: If the incoming data is double-spaced, the Remove Rows menu has an option to Remove Alternate Rows. The command is flexible, allowing you to specify something such as keep row 2, delete the next three rows, then keep one row, and follow that pattern through the data.
Split Column: This command allows you to specify that the split should happen at the change from text to numeric characters or vice versa. If you have part numbers like A123, BC234, or DEF111, this built-in split command will handle it easily.
Extract menu: This menu lets you choose all text after a delimiter, before a delimiter, or between two delimiters.
Information menu: Excel has functions to detect ISEVEN, ISODD, and SIGN, but you’ll find easier-to-use tools in the Information menu to perform the same actions.
Index column: You can quickly add an index column starting from either 1 or 0.
Some other useful commands not shown in Figure 2 are the unpivot command as well as the abilities to split a cell at each delimiter and to split the data to new rows.
Just like in Excel, many of the popular commands can be found by right-clicking on a column heading. If you’re used to Excel, using the Power Query Editor should be easy to figure out.
As you clean your data, a list of the applied steps you’re taking is created on the right side of the screen. Keep an eye on this list. If you make a mistake, use the “X” next to any step to delete it and try again. This panel provides an audit trail and the ability to look back at the interim results after any step.
When you’ve finished cleaning your data in Power Query, use the Close & Load command on the Home tab. Excel will insert a new worksheet with the cleaned version of your data.
The real benefit is the next time you update the original data, you can use the Refresh All command on the Data tab to clean the data again.