This is part 6 of our blog series on data enrichment. Feel free to catch up if you missed the first five posts: Introducing the Data Enrichment 101 Series, Determining the Processes the Data Will Support, Determining Your Target Market, Selecting Data Vendors, and GDPR Compliance.
More often than not, the issue with data quality is not the complete absence of data, but just incomplete, non-uniform, and unstructured data, for example:
- Incomplete data
- Address contains ZIP code but no no city and state
- Address contains state but no country
- Phone number is missing country code
- Non-uniform data
- State contains: California, CA, Calif
- Country contains: United States, US, USA, U.S.A.
- Puerto Rico is sometimes a country and sometimes a state
- Unstructured data
- Has job title, but no job function or job level
- Too many variations on industry data
- Annual revenue and employee count are sometimes numbers and sometimes ranges and the ranges are all different depending on the data source
Most of these data quality problems can be resolved without paying a third-party provider. You should reserve your precious enrichment budget to acquire missing data or validate data. Here are some methods you can use to improve your data quality without spending money with a data provider. We will discuss later that there is great benefit to “pre-clean” your data before sending them to an enrichment provider.
1. Leverage Open Data
Many data fields are related, especially address, phone number, email, and website data. The missing data fields can often be filled in easily if you have the right reference data. Here are some examples:
- If state = California, country can be inferred as US
- If country = US and ZIP code = 94403, city and state can be inferred as San Mateo, CA
- If country = Ireland, then the country code can be inferred as +353
- If area code = 510, then the metro area can be inferred as San Francisco Bay Area
This type of inference and filling in the blanks can be done with a combination of automation technology and open data. Automation technology can be as simple as Excel’s VLOOKUP formula, or something more advanced like Openprise Data Orchestration solution. For open data, check out the Marketing Open Data Project at https://marketingopendata.org. It is a great source for the various list you will need, such as city, state, county, country, area code, postal code, stock ticker symbols, domain suffix, free and disposable email provider, urban and metro area.
2. Leverage Freemium Services
There are plenty of free resources on the web that can be leveraged to improve your data quality. Google Maps is a great example. We probably all have used Google Maps and know how powerful the service is. It can:
- Provide geolocation data given an address
- Return complete address given partial address
- Return business name, business type, and phone number
- Standardized any address text to a standard format and breakdown into component parts
Many of the Google Services, such as Google Maps / Places, Google Search, and Google Translate are available as API and comes with a free usage tier. So if you have the right automation solution, you can leverage these free services and take advantage of your free daily quota.
3. Leverage Your Own Database
In many organizations, the Account and Contact data (using Salesforce terminology here) is of much higher quality than the Lead data. Account and Contact data has been scrutinized by the sales team and often contains valuable data from manual information gathering and research processes. You can easily improve your Lead data by finding the right match to your Account and Contact data using any number of matching schemes. For example, if a Lead record only contain the company name and state, if that can be matched to an Account or Contact record that has the full address, then the Lead’s address data can be filled out.