Use Fuzzy Search to Improve Data Quality
Your marketing data is dirty and will always be dirty. Let’s just get that fact out of the way. Solutions like Openprise can help you scrub and normalize a great percentage of your data, but it will never be pristine, not for more than a fleeting moment, if your data is dynamic and generated by more than a few people. Your data will always be dirty. It is just a question of how dirty. That doesn’t mean you shouldn’t bother to clean up and normalize it. Data quality is critical if you want to have dashboards and analysis that make sense, automations that function, and business rules that operate as intended.
Data quality is definitely one area where the 80/20 rule applies. Cleaning up and normalizing data such as:
- Country and state names
- Phone number format
- Email and URL
- Capitalization on name
is relatively straight forward and can be done to a high rate of success. Cleaning up and normalizing less standardized data like:
- Company name
- Job title
- Seniority level
- Part number
- Failure and repair code
is more difficult due to the large number of variations and the lack of an agreed upon list of values to normalize to. For example, cleaning up and normalizing customer company names is relatively simple, but doing it for the large number of prospect company names is significantly more difficult.
This is where a little fuzzy logic can help a great deal.
Whether it is search, reporting, or business rules, it all starts with searching for the data to operate on. This search has to deal with data variations introduced by:
- Common variations, e.g. “Account executive” vs. “Sales Manager”
- Abbreviations, e.g. “Vice President” vs. “VP”
- Regional differences, e.g. “Analyze” vs. “Analyse”
- Spelling errors, e.g. “Massachussettes” or ” “Mississipi”
- Partials, e.g. “Disney” vs. “The Walt Disney Company”
- Changes over time, e.g. “Apple Computer” vs. “Apple”
Using fuzzy search instead of discrete search can solve a number of these problems above without having to first normalize or cleanse your data. It is also a great way to deal with that last 20% of hard-to-clean data. Here is an example using an actual Marketo leads database.
If we do a pie chart analysis in Openprise searching on company names containing the word “toyota”, we get 87 results with the top 10 variations shown in the chart.
If we do a search on “Toyota Motor Sales USA” using exact match, we get 7 results, which captures less than 10% of the total records containing the word “toyota”.
Now if we perform the same search using the fuzzy match operator, we get 78 records, nearly 90% of the records containing the word “toyota”. You can see in the search results table the variations of company names that were found. The only records excluded are variations on the name “Toyota”, which are too short compared to the search term “Toyota Motor Sales USA”
Let’s try a search term “Toyoda motor sale usa”. This search term contains a typo in the word Toyota and uses the singular form of the word sale. The fuzzy search still returned the same 74 records.
Fuzzy search can be a powerful tool to ensure your reports, analysis, and business rules still work when the data is anything short of perfect. It can greatly simplify report and rule configuration to not include an exhaustive list of every variation and error on your search terms. If you spend the effort to clean and normalize 80% of your data, fuzzy search can make that remaining 20% tolerable and save you a ton of money.
Give fuzzy search a try in Openprise and let us know what you think.