Automated Data Onboarding: Data Preparation Tasks

Automated data onboarding: data preparation tasks

Welcome back to our blog series on automated data onboarding and list loading. If you missed the introduction, you can find it here. If you missed the second blog on setting up the input, you can find it here.

Step 2: Set Up Automated Data Preparation Tasks

Automated tasks for list loading—or any data-driven process for that matter,—generally fall into three categories: data preparation tasks, decision-making, and action execution.

A. Data Preparation Tasks

A.1    Clean

Any process you automate should start with data cleansing first. Typical cleansing tasks can include:

  • Fix bad Email syntax, e.g., [email protected] → jdoe@acme.com
  • Fix invalid Email suffix, e.g., jdoe@acme.con → joe@acme.com
  • Fix name capitalization, e.g., John DOE → John Doe
  • Clean and truncate Company Name, e.g., Acme, Inc. (NYSE ACM) → Acme
  • Fix and truncate Website URL, e.g., https://www.acme.com/478khghf7q65tr → www.acme.com

A.2    Check for Missing Data

Not all missing data can be enriched or automatically filled in, so when those absolutely required data fields have no value, you should reject those records. For example, Contact Name, Email, and Campaign ID are often required data fields.

A.3    Standardize / Normalize

Normalize data to your data standards so that you don’t end up with 1,000 countries and 250 states in the United States. Typical normalization tasks include:

  • Normalize Country, e.g., US, U.S.A, USA → United States
  • Normalize State/Province, e.g., Calif, CA, Cal → California
  • Normalize Phone Number, e.g., 408.555.1234 → +1 (408) 555-1234
  • Standardize Lead Source

To be efficient at normalization and segmentation tasks, the technology you use needs to have fuzzy matching or equivalent capabilities so that you can catch all the spellings of Massachusettes, Masachusetts, Masachusetes, and Masssachusett.

The technology you use should be easy to set up and maintain. For example, in order to normalize the Country field, how many rules do you need to set up? If the answer is 200+, then you have the wrong technology. The answer should be one rule. This is why all the salesforce and marketing automation platforms do such a poor job at data cleansing tasks. Although theoretically you can perform all these tasks using those platforms, they were never designed to do these tasks well.

A.4    Fill in the Blanks & Enrich with Open Data

Data enrichment can be done two steps. The first step is to leverage public/open data. The second step is to leverage third-party data providers. Open data are readily available via government agencies and other public Internet sources. The types of enrichments you can do with open data include:

  • Fill in missing address parts, e.g., ZIP = 94033, → City = San Mateo,  State = California
  • Detect and correct inconsistent address, e.g., California, Canada
  • Add Region data, e.g., metro area, urban area, continent
  • Add Timezone data so your sales team knows the best time to call
  • Add Industry data, e.g., NPPES NPI Registry for US healthcare providers

With the pending enforcement of General Data Protection Regulation (GDPR), this is a critical step. Open data can help you enrich data enough to identify if the person is considered a EU data subject. For more on enrichment challenges associated with GDPR, see Openprise’s Complete Enrichment Survival Guide for Marketing and Sales.

There is more to cover in data preparation tasks. We will continue next time.

Leave a comment