Fuzzy merge is a smart data preparation feature you can use to apply fuzzy matching algorithms when comparing columns, to try to find matches across the tables that are being merged.
You can enable fuzzy matching at the bottom of the Merge dialog box by selecting the Use fuzzy matching to perform the merge option button. More information: Merge operations overview
Fuzzy Duplicate Finder is a tool for Microsoft Excel 2016 - 2007 that helps you find and correct similar records. The add-in quickly performs approximate match according to the settings you select and changes all typos into the correct equivalents of your choice. How to use the Find Fuzzy Duplicates tool. Start Find Fuzzy Duplicates. Download Excel File: how to perform a partial text lookup; Finding “Coca Cola” in “Coca Cola Inc.” or the reve. Select Use fuzzy matching to perform the merge, select Fuzzy matching options, and then select from the following options. Similarity Threshold Indicates how similar two values need to be in order to match. The minimum value of 0.00 causes all values to match each other. The maximum value of 1.00 only allows exact matches.
Note
Fuzzy matching is only supported on merge operations over text columns. Power Query uses the Jaccard similarity algorithm to measure the similarity between pairs of instances.
Sample scenario
A common use case for fuzzy matching is with freeform text fields, such as in a survey. For this article, the sample table was taken directly from an online survey sent to a group with only one question: What is your favorite fruit?
The results of that survey are shown in the following image.
Sample survey output table containing the column distribution graph showing nine distinct answers with all answers unique, and the answers to the survey with all the typos, plural or singular, and case problems.
The nine records reflect the survey submissions. The problem with the survey submissions is that some have typos, some are plural, some are singular, some are uppercase, and some are lowercase.
To help standardize these values, in this example you have a Fruits reference table.
Fruits reference table containing column distribution graph showing four distinct fruits with all fruits unique, and the list of fruits: apple, pineapple, watermelon, and banana.
Note
For simplicity, this Fruits reference table only includes the name of the fruits that will be needed for this scenario. Your reference table can have as many rows as you need.
The goal is to create a table like the following, where you've standardized all these values so you can do more analysis.
Sample survey output table with the Question column containing the column distribution graph showing nine distinct answers with all answers unique, and the answers to the survey with all the typos, plural or singular, and case problems, and also contains the Fruit column containing the column distribution graph showing four distinct answers with one unique answer and lists all of the fruits properly spelled, singular, and proper case.
Fuzzy merge
To do the fuzzy merge, you start by doing a merge. In this case, you'll use a left outer join, where the left table is the one from the survey and the right table is the Fruits reference table. At the bottom of the dialog box, select the Use fuzzy matching to perform the merge check box.
After you select OK, you can see a new column in your table because of this merge operation. If you expand it, you'll notice that there's one row that doesn't have any values in it. That's exactly what the dialog box message in the previous image stated when it said 'The selection matches 8 of 9 rows from the first table.'
Fruit column added to the Survey table, with all rows in the Question column expanded, except for row 9, which could not expand and the Fruit column contains null.
Fuzzy matching options
You can modify the Fuzzy matching options to tweak how the approximate match should be done. First, select the Merge queries command, and then in the Merge dialog box, expand Fuzzy matching options.
The available options are:
- Similarity threshold (optional): A value between 0.00 and 1.00 that provides the ability to match records above a given similarity score. A threshold of 1.00 is the same as specifying an exact match criteria. For example, Grapes matches with Graes (missing the letter p) only if the threshold is set to less than 0.90. By default, this value is set to 0.80.
- Ignore case: Allows matching records no matter what the case of the text.
- Match by combining text parts: Allows combining text parts to find matches. For example, Micro soft is matched with Microsoft if this option is enabled.
- Number of matches (optional): Specifies the maximum number of matching rows that can be returned for every input row.
- Transformation table (optional): Allows matching records based on custom value mappings. For example, Grapes is matched with Raisins if a transformation table is provided where the From column contains Grapes and the To column contains Raisins.
Transformation table
For the example in this article, you can use a transformation table to map the value that has a missing pair. That value is apls, which needs to be mapped to Apple. Your transformation table has two columns:
- From contains the values to find.
- To contains the values that will be used to replace the values found by using the From column.
For this article, the transformation table will look as follows:
From | To |
---|---|
apls | Apple |
You can go back to the Merge dialog box, and in Fuzzy matching options under Number of matches (optional), enter 1. Under Transformation table (optional), select Transform Table from the drop-down menu.
After you select OK, you'll create a table that looks like the following image, with all values mapped correctly. Note how the example started with nine distinct values, but after the fuzzy merge, there are only four distinct values. Google apps for macbook pro.
Fuzzy merge survey output table with the Question column containing the column distribution graph showing nine distinct answers with all answers unique, and the answers to the survey with all the typos, plural or singular, and case problems. Also contains the Fruit column with the column distribution graph showing four distinct answers with one unique answer and lists all of the fruits properly spelled, singular, and proper case. Software download sites free full version.
There’s a research and development team at Microsoft known as Microsoft Research Labs. Almost 10 years ago, it invented a free Fuzzy Lookup add-in for Excel. The fuzzy matching algorithm looks for words that share a percentage of characters in common. That functionality is now built into Windows versions of Microsoft 365.
Figure 1 shows two data sets that need to be matched. Columns A and B contain the list of employees. Columns D and E contain the names of the employees who filled out a required form. You need to identify the people who haven’t yet returned the form. Unfortunately, a VLOOKUP or XLOOKUP won’t work since column A uses a “last name, first name” format while column D contains a nickname and last name. These two data sets can be matched using the fuzzy matching option.
Columns G and H show a translation table that will be used by the fuzzy match to help match full first names with their nicknames. The translation table requires two columns, labeled “From” and “To.” The fuzzy match will likely match Kris and Kristy because they share many letters, but it will need an entry in the translation table for Bill and William or Nathan and Nate.
Before you can perform the match, all three ranges of data have to be converted to a table by selecting each individual range and pressing Ctrl+T. Then rename each table: Select one cell in a table. Go to the Table Tools tab in the Ribbon and type a meaningful name such as “Census,” “Forms,” and “Nicknames.”
DEFINING CONNECTIONS TO EACH TABLE
Fuzzy Excel Plugin
You need to convert each of the three tables to a connection in Excel. From cell A1, select, Data, From Table/Range (as shown at the red arrow in Figure 1). Excel will open the Power Query Editor. The first icon on the Home tab says “Close & Load.” Click the drop-down menu below it and choose “Close & Load To…” to open the Import Data dialog box. Choose the fourth item, called Only Create Connection.
Repeat the process of creating a connection for the other two tables, starting in cell D1 and cell G1, respectively. If you created all three connections correctly, you should see three queries listed as “Connection Only” in the Queries & Connections pane on the right side of the Excel window.
PERFORMING THE FUZZY MATCH
Select a blank cell in your worksheet. From the Data tab, select Get Data, Combine Queries, Merge to open the Merge dialog. There are many subtle settings in this dialog that aren’t intuitive. Figure 2 shows the eight steps:
Fuzzy Excel
- From the top drop-down menu, select the Census table.
- In the small data preview, click on the heading(s) of the fields to be used for the matching. In this case, it’s the Employee Name heading.
- From the second drop-down menu, choose the name of the lookup table. In this case, Forms.
- In the data preview, click on the heading(s) of the fields to be used for matching, such as Name.
- Check the box for “Use fuzzy matching to perform the merge.”
- Several special settings are hidden behind the drop-down menu for the fuzzy matching options. Click the triangle to reveal this section.
- Scroll to the bottom of the section and set Nicknames as the Transformation Table.
- Verify that the number of matches found is the same as the number of records in the Forms table.
Click OK to perform the merge. The grid in the Power Query Editor will show columns for Employee Name, Department, and then a column called Forms. The value in each row for Forms simply says “Table” in each row. To the right of the “Forms” heading is an Expand icon with two arrows pointing left and right. Click this icon to choose which fields from the Forms table to return.
Once you have the preview shown in the Power Query Editor, go to Home, Close & Load to deliver the results to a new table on a new worksheet. You could optionally use Close & Load To… and specify a location on an existing worksheet for the table.
At this point, review the results to make sure no false matches were found. If everything looks good, you can sort or filter to remove the records that show a match, leaving the people who haven’t turned in the form.
Ever since my Accounting 101 class, I was taught that “close” is never acceptable in accounting. This leads to a reluctance to trust the fuzzy matching algorithm. Yet there are cases where the fuzzy match tool is the only solution short of manually matching records.
SF Says:
As more people turn in their forms, choosing Data, Refresh All will automatically perform the fuzzy match again.