Excel: Fuzzy MatchingBy
There’s a research and development team at Microsoft known as Microsoft Research Labs. Almost 10 years ago, it invented a free Fuzzy Lookup add-in for Excel. The fuzzy matching algorithm looks for words that share a percentage of characters in common. That functionality is now built into Windows versions of Microsoft 365.
Figure 1 shows two data sets that need to be matched. Columns A and B contain the list of employees. Columns D and E contain the names of the employees who filled out a required form. You need to identify the people who haven’t yet returned the form. Unfortunately, a VLOOKUP or XLOOKUP won’t work since column A uses a “last name, first name” format while column D contains a nickname and last name. These two data sets can be matched using the fuzzy matching option.
Columns G and H show a translation table that will be used by the fuzzy match to help match full first names with their nicknames. The translation table requires two columns, labeled “From” and “To.” The fuzzy match will likely match Kris and Kristy because they share many letters, but it will need an entry in the translation table for Bill and William or Nathan and Nate.
Before you can perform the match, all three ranges of data have to be converted to a table by selecting each individual range and pressing Ctrl+T. Then rename each table: Select one cell in a table. Go to the Table Tools tab in the Ribbon and type a meaningful name such as “Census,” “Forms,” and “Nicknames.”
DEFINING CONNECTIONS TO EACH TABLE
You need to convert each of the three tables to a connection in Excel. From cell A1, select, Data, From Table/Range (as shown at the red arrow in Figure 1). Excel will open the Power Query Editor. The first icon on the Home tab says “Close & Load.” Click the drop-down menu below it and choose “Close & Load To…” to open the Import Data dialog box. Choose the fourth item, called Only Create Connection.
Repeat the process of creating a connection for the other two tables, starting in cell D1 and cell G1, respectively. If you created all three connections correctly, you should see three queries listed as “Connection Only” in the Queries & Connections pane on the right side of the Excel window.
PERFORMING THE FUZZY MATCH
Select a blank cell in your worksheet. From the Data tab, select Get Data, Combine Queries, Merge to open the Merge dialog. There are many subtle settings in this dialog that aren’t intuitive. Figure 2 shows the eight steps:
- From the top drop-down menu, select the Census table.
- In the small data preview, click on the heading(s) of the fields to be used for the matching. In this case, it’s the Employee Name heading.
- From the second drop-down menu, choose the name of the lookup table. In this case, Forms.
- In the data preview, click on the heading(s) of the fields to be used for matching, such as Name.
- Check the box for “Use fuzzy matching to perform the merge.”
- Several special settings are hidden behind the drop-down menu for the fuzzy matching options. Click the triangle to reveal this section.
- Scroll to the bottom of the section and set Nicknames as the Transformation Table.
- Verify that the number of matches found is the same as the number of records in the Forms table.
Click OK to perform the merge. The grid in the Power Query Editor will show columns for Employee Name, Department, and then a column called Forms. The value in each row for Forms simply says “Table” in each row. To the right of the “Forms” heading is an Expand icon with two arrows pointing left and right. Click this icon to choose which fields from the Forms table to return.
Once you have the preview shown in the Power Query Editor, go to Home, Close & Load to deliver the results to a new table on a new worksheet. You could optionally use Close & Load To… and specify a location on an existing worksheet for the table.
At this point, review the results to make sure no false matches were found. If everything looks good, you can sort or filter to remove the records that show a match, leaving the people who haven’t turned in the form.
Ever since my Accounting 101 class, I was taught that “close” is never acceptable in accounting. This leads to a reluctance to trust the fuzzy matching algorithm. Yet there are cases where the fuzzy match tool is the only solution short of manually matching records.
As more people turn in their forms, choosing Data, Refresh All will automatically perform the fuzzy match again.