You’re excited about getting rid of those pesky duplicate records in your Salesforce.com and marketing automation solutions (well, we get excited about this kind of stuff). You want to jump right in because those duplicate records have annoyed you for too long and you want them GONE! GONE! GONE!!! Before you get too worked up about getting your lead deduplication project started, a little bit of planning can save you a lot of frustration as you progress.

Here is a handy lead deduplication checklist to help with your project planning.

  1. Document your lead deduplication logic
  2. Verify your lead sync status and arrangement
  3. Check for data verification rules and other automation that may interfere
  4. Check for bad data and business processes that may interfere

Document Your Lead Deduplication Logic

Lead deduplication is one of those seemingly simple tasks that is actually fairly complex in execution. There are many nuances you’re probably unaware of unless you have done it many times before. The first thing you need to do is to think through and document your lead deduplication logic in both Salesforce.com and your marketing automation solution. There’s no such thing as “generic lead deduplication logic”. If a vendor tells you it has a proprietary algorithm that can magically dedupe your database, you should run away because your database will probably end up being ruined.

Your lead deduplication logic has to accommodate the following:

  1. The current state of your data
  2. The sources of duplicates
  3. The systems and syncing technologies involved
  4. The controls and automation that are in place
  5. The people and the process it touches (see our last blog on this topic: Lead Deduplication Project Considerations: People & Process)

There are three key parts to the lead deduplication logic you must figure out. We’ll cover each of them in detail in subsequent blog posts. They are:

  1. How to identify the duplicates
  2. How to select the surviving/winning record
  3. How to merge the non-surviving/losing records, accounting for system restrictions

Verify Your Data Sync Status Between Salesforce.com & Your Marketing Automation Solution

Whether you’re deduplicating leads, contacts, or accounts, chances are the data set you’re trying to dedupe exists in multiple systems. For B2B marketers that are looking to deduplicate leads and contacts, the data usually live in both your Salesforce.com and marketing automation platform, and possibly others like help desk and customer success platforms. Chances are they’re synchronized to some degree. Here are scenarios we often see:

  1. Salesforce.com and marketing automation systems are fully synced, so every record exists in both systems.
  2. Salesforce.com solution has more leads than the marketing automation platform because only marketable leads that are CAN-SPAM compliant are in the marketing automation platform. Leads that the sales team have prospected that cannot receive mass communication are kept only in the Salesforce solution.
  3. Marketing automation has more leads than the salesforce system because only marketing qualified leads (MQLs) are pushed into Salesforce to focus the sales team on hot leads.
  4. A combination of scenarios 2 and 3.

Depending on how many systems are involved and how your syncing is currently implemented, you may be able to get away with just deduping one database and let the sync update the other systems, or you may need to independently dedupe the individual databases. If you have the option to dedupe only one database, which one is the better one? In general deduping should be done on the system of record for each dataset. For example, deduping of leads should be done in your marketing automation platform while deduping of accounts should be done in the sales system.

Just because your databases are supposed to be fully synced doesn’t mean they actually are. Many things can cause syncing algorithms to miss records, and they can accumulate to substantial amounts quickly. Here are typical reasons why your data syncs may not be perfect:

  1. Many syncing and data automation technologies only fire one time when the record is created. Subsequent changes to that record may not get synced.
  2. Many syncing technologies don’t handle deleted and merged records properly, so when one record is deleted in one system, the other systems are unaware that the record no longer exists.
  3. When syncing is interrupted, backed up, or runs into errors for whatever reason, not all syncing technologies can gracefully recover and resume, which can results in records being out-of-sync.
  4. If you have bi-directional syncing, the syncing logic may not be built or configured to be exactly the same in both directions. Sometimes syncing works better or is supposed to work only in one direction.

It’s worthwhile to audit your various databases to verify if supposedly fully synced databases actually are.

Check For Data Verification Rules and Other Automation that May Interfere

Your systems may have data verification rules and other types of features turned on that can interfere with removing of duplicates. If there are, you’ll see these symptoms when you try to merge records:

  1. A record you deleted in System A is not deleted in System B and becomes orphaned in System B.
  2. A record you deleted in System A is not deleted in System B and is later recreated in System A by System B.
  3. A record you updated in System A is partially updated in System B because some data field updates were blocked by System B. The records remains out-of-sync forever, or gets back in-sync the next time System B initiates the sync.

Here are some examples of interfering automations:

  1. Your Salesforce.com Contact record may have required fields that the Lead record doesn’t have. In order to merge a Lead with a Contact record, the Lead record must be converted to a Contact first. The conversion will fail if required data is missing.
  2. If you have duplicate blocking turned on in SFDC, the above scenario will also fail while attempting to convert a Lead to a Contact.
  3. Salesforce.com is the system of record for Account data. So any change to the company information for a Lead in Marketo that is a Contact in SFDC, the changes will not propagate to SFDC and the Marketo company data will get reverted the next time SFDC initiates a sync.

If you are doing a one-time lead deduplication exercise, you can just suspend some of these conflicting automation for the time being. If you are setting up a continuous lead deduping process, then you need to rationalize which technology does what so they don’t step on each other.

Check For Bad Data & Business Processes That May Interfere

Your system may have bad data that can also prevent deduping from executing successfully. Some of these bad data situations may be quite a head-scratcher on how they came to be, but they are definitely out there. In some cases, they are explicitly allowed by your business processes. Here are some examples:

  1. A record owner may no longer be a valid user in the target system. So any attempt to update or merge such a record will be rejected.
  2. If you allow Contact record in Salesforce.com to have no Account affiliation (a.k.a. Private Contacts), then any attempt to merge a Contact without an Account to Contacts with Accounts will require additional logic on Contact-to-Account matching.
  3. A record may contain an old data value that has since been changed to a picklist. Any attempt to update such a record will require additional logic to reset the outdated record.

Any skilled craftsman will tell you, “Measure twice and cut once”. Lead deduplication is no different. The proper planning upfront can save you a lot of pain, frustration, and damage remediation later on.

Leave a comment