TransWikia.com

How to match variations of text strings in a table to standardized versions of those text strings in another table with formulas in Excel?

Super User Asked by brandonb on February 7, 2021

I have a table with the names of classes; however, these names have not been typed consistently. This has resulted in multiple variations of the same class name (See Variations table below). For example, "Learning to Sew", "Learning Sewing", and "Learn to Sew." I also have a table of the standard names for these classes (See Standard Names below). I want to match the variations with the standard name, so the standard name appears in a column next to the variation name (See Result table). To further complicate things, one class in particular had its name changed, so any search will need to look for both versions and return the current version (See Experiencing Biology below). I have been able to use COUNTIF to count the occurrences of each standard name across the variations using search terms that have the key words of the names combined with wildcards and multiple criteria when needed. I have not been able to figure out how to perform the match however. Is there a formula in Excel that can accomplish this?

Note:

  • Spelling errors are not of concern.
  • There are some classes which do not have a standard name and can be ignored.

Variations

Class
Learning to Sew
Learning Sewing
Learn to Sew
Experiencing Biology
Cycles of Biology   

Standard Names

Class                   Search Term 1             Search Term 2        Count
Learning to Sew         *Learn*Sew*                                    3
Experiencing Biology    *Experiencing*Biology*    *Cycles*Biology*     2

Result

Class                  Standard Title
Learning to Sew        Learning to Sew
Learning Sewing        Learning to Sew
Learn to Sew           Learning to Sew
Experiencing Biology   Experiencing Biology 
Cycles of Biology      Experiencing Biology

One Answer

I created two named tables (Ctrl+T then name them on the Table Design tab):

  1. variations (containing your list of variations)
  2. standard (containing the two standard names)

enter image description here

On each table, I used Data>Get & Transform Data>From Table/Range to create a query. In the power query editor, I used Home>Close & Load>Close & Load To>Only Create Connection such that I had two query connections, one to each table:

enter image description here

I then used Data>Get & Tranform Data>Get Data>Combine Queries>Merge and configured the merge dialog like this:

enter image description here

Depending on the rest of your data, you may need to experiment with an optimal value for the similarity threshold. You can also explore using a transformation table to make the job easier if you find that you're getting false positives.

After clicking OK on the merge, then using the double-arrow at the top of the column with the word Table in each row and expanding the joined data, I got this:

enter image description here

To put this resulting dataset back into the workbook, I would just use Home>Close & Load.

This is not foolproof, but with the right parameters in the fuzzy matching, you may find something that works for your entire dataset.

Answered by FlexYourData on February 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP