Dedupe project considerations: people and process

Post by: Ed King in Cleanse and unify data

As with most any topic in Marketing, you usually start with people and process first, before you get to data and technology. Eliminating duplicate data records is no different. Here are some of the key people and process questions you should answer first before you embark on solving your dedupe problem.

Data Ownership

Dedupe efforts can cover Lead, Contact, and Account (using Salesforce.com terminology) data. These data often have different systems of record and owners. Lead data is often owned by Marketing and the system of record is usually the marketing automation platform. Contacts and Accounts are generally owned by Sales and the system of record is generally the salesforce automation platform. If user data is part of the project scope, then we add product and customer success teams as potential owners and your application and help desk platforms as additional systems of record. This data can be completely separate, partially synchronized, or fully synchronized.

The data owners must agree on the deduplication scheme and process. If the process requires significant time, effort, and budget commitment, then the data owners need to be fully committed for the project to be successful. For example, if Sales insists on manually reviewing every Account record merge, then the dedupe process must consider how to efficiently involve every account executive in the process.

Data ownership can extend beyond the departmental level to even the hierarchy level. For example, did you know it’s possible to have Contacts without Account affiliation in Salesforce? These “Private Contacts” are considered data private to account reps. If your CRM allows for Private Contacts, should they be included in the dedupe effort?

One Time or Continuous Dedupe

Is the dedupe process a one-time (periodic) batch process, a continuous process, or a combination of both? One-time processes involve a massive clean up that can happen once a quarter to once every few years. For one-time processes, a manual or semi-automated solution is perfectly acceptable as long as the solution proposed can accommodate the time and budget requirements. If the deduplication process is to continue as an ongoing process after the initial clean up effort, then automation solutions must be in the discussion from the start. A manual or even semi-manual process is simply not scalable and manageable.

Given the people and process constraints you have, decide if a continuous process is realistic. If not, determine how close to an ideal state is acceptable, and consider if it can it be supplemented by smaller scale periodic batch cleanups. For example, if Sales insists on manually reviewing dedupe results and merging Account records, then in order to have a continuous dedupe process, you must get a Service Level Agreement from the Sales team on how quickly can they review the dupes.

Dedupe New Records, Some Records, or All Records

It’s a lot easier to prevent new duplicates from being created than to remove existing duplicate records. Often it makes the most sense to separate them into separate processes that work together to create a comprehensive dedupe solution. The new dupe prevention process can run continuously, supplemented by a full dedupe process that runs less frequently, like quarterly. For example, if list loading is a major source of lead input for you, then by simply preventing dupes from being created while loading a list can help tremendously in slowing the growth of dupes in your database. While the full dedupe of your database may involve more stakeholders and take longer to figure out an acceptable process, you maybe able to implement a dupe prevention process for list loading quickly as Marketing has full control over that data and the process of list loading.

Some ways you should consider to cut the dedupe problem “down to size”:

If you understand the major sources of your dupes, then try to control those sources first to “stop the bleeding”.
If multiple databases are involved that are separate or partially synchronized, consider first deduping each database or parts of the database separately, then combine them one at a time and execute dedupe in phases.
If a certain subset of the data is more tricky to dedupe due to data quality, people, or process issues, consider leaving them out in the first phase of the deduping effort. Deduping success doesn’t require achieving perfection. Having a 95% clean database this month is better than doing nothing because of the inability to get to a 100% clean database.

Which System to Dedupe Against

If the data you’re looking to dedupe exists in multiple systems, you need to make a decision about where the dedupe should happen. In general, when data are synced between different systems, one of the systems is the master. This is the system we recommend you dedupe against in most cases. Note that the master system may not be where the data originated. One common example is Salesforce.com Contact vs. Marketo Lead. The lead record may have originated in Marketo, and then synced to Salesforce. As long as the person remains a Lead, Marketo is considered the system of record and dedupe should be done against Marketo. Once the lead is converted to a Contact, Salesforce now becomes the system of record. In this case, it’s best to dedupe a Contact directly against Salesforce.

There may be constraints that will limit your choices. For example, you may not have purchased API access (e.g., available only in the Enterprise subscription of Salesforce.com) for the system you wish to dedupe. You may not own or do not have authorization to change the data in certain systems. In these situations, you may have to dedupe against the secondary system and let the data synchronization propagate the changes to the system of record.