Data enrichment part VII: match test
This is part 7 of our blog series on data enrichment. If you missed the first few, you can catch up starting with Introducing the Data Enrichment 101 Blog Series.
Once you have your requirements clearly defined and have assembled a shortlist of data providers to evaluate, it’s time to run a database match test. Here are some recommendations and tips on how to run an efficient and effective multi-vendor match test.
The match test should be run on a “representative” sample of your database that is substantial enough to yield meaningful results. We recommend the sample size to be at least 10% of your database or 10,000 records.
A “representative” sample is not just a random sample of your entire database, it should be a random sample of the part of the database that you most desperately want to improve according to the process you want to support, as discussed previously. This usually means it’s the worst part of your database. For example, if you’re looking to enrich database leads during the list loading process and you would like to spend money to enrich only leads that are missing phone number and job title, then that scope should be the baseline for your “representative sample”. You should pull a sample of leads that have no phone number and no job title. This is important as it correctly evaluated the vendor’s performance for the part of the database that you want to enrich.
Specify Your Critical Data Fields
Make sure you tell the vendor which data fields are critical to your evaluation. Whether it’s direct dial phone number, email, job title, company size, industry, or DUNS number. Ask the vendor to provide match rate statistics on these fields explicitly.
Ask Vendors About Preparation Work
Each vendor’s matching algorithm works differently and its performance can vary drastically based on how the input data is formatted and structured. There is nothing wrong with a vendor doing manual preparation work on your data before they attempt the match, but you should ask the vendor what preparation work they have performed to achieve the match result. This is important, because in order to achieve comparable match rate they quoted you during the test, you will need to do the same preparation work in your production environment. Here are some examples we have seen from the database match tests we have run for our customers:
- Match rate is much better using separate first and last name data fields compared to using a single full name field
- Website data must begin with “www” or else it won’t match
- Email syntax error and invalid suffix must be corrected before matching, for example, email@example.com must be corrected to firstname.lastname@example.org
- Phone number data cannot contain any alphabet, for example “(800) Flowers”, “650.555.1212 (FAX)”, “4155551212ext1234” won’t match
- Multiple values in one data field won’t match, such as “(415) 555-1212, (650) 555-1212”
- Use fewer fields for matching can produce better results
- Certain data fields are absolutely required, such as a business email address
Get Samples of Full Data Set
It’s not realistic or fair to expect the vendor to return the full data set for all your matched records. After all, when you go to Costco and taste the samples, you don’t expect them to give you the whole bag or bottle, do you? Expect to get back a report on:
- How many records were successfully matched
- How many matched records contain the critical data fields you specified
- When the matched records were last updated
- The match confidence score
In addition to the report, it is fair to request a small sample of records enriched so you can see what the data looks like and do your own quality validation. What each vendor will agree to highly depends on the size of your potential contract.
Validate the Results
Even if a data provider tells you a match is found, it simply means they have a record. It doesn’t mean the record is any good. If you have spent any time talking to data providers, you have no doubt heard the quality vs. quantity argument. Vendors whose match rate is lower than the others almost always claim the others have lower quality data. The only way to know how good is the data provider’s quality is to validate it yourself. Include a list of contacts from your own company, your customers, and your friends, data you know what is accurate and what is not. This doesn’t have to be a large set of data, 100 would do just fine. Use this sample data set to evaluate the quality of the matched records.