Merge Road Sign

How to merge duplicate records

Once you’ve identified the duplicate records and figured out which surviving records to keep, the last part of your deduplication logic is to merge the non-surviving records duplicate records into the surviving record. In some cases, you may want to simply discard or remove the non-surviving duplicate records. That simple scenario requires no further discussion.

First Establish a Default Logic, Then Exceptions

Chances are you have more than just a few data fields in the data you are looking to merge, perhaps even hundreds, and we have seen thousands. In order to scale, you should first establish a default merge logic that will be applied to all data fields. Once you have a default logic, then you can define exceptions for specific data fields. The most common default logis is “fill if empty”. We will discuss the various merge logics next.

Merge Logics

Fill If Empty

This is the most common merge logic, thus the most popular default merge logic. This logic says if any data field in the surviving record is empty, then attempt to fill it with a non-empty value from one of the non-surviving records. You also need to provide additional logic on what sequence to sort through the non-surviving records. Here is an example of 3 records in a duplicate set, with the non-surviving records sorted with more recently update record on top. The merge logic is fill if empty using latest modified record.

Surviving Original        John Doe

Non-surviving 1           J. Doe                                     VP Marketing

Non-surviving 2           John M. Doe         Acme Inc.        CMO


Surviving Merged        John Doe         Acme Inc.        VP Marketing

Always Replace

This is exactly the same logic as the one above, except it doesn’t require the surviving record data field to be empty. It applies the merge logic to all the records in the duplicate group, including the surviving record, pick the value that meets the requirement, then replace the value in the surviving record, empty or not. Common examples include:

  • Always take contact information from the last modified record
  • Always take lead source from the earliest created record

Here is an example of 3 records in a duplicate set sorted by latest modified date on top. The merge logic for email is to use the latest modified date. The merge logic for lead source is to use the earliest modified date. The default merge logic is fill if empty.

Non-surviving 1           J. Doe             Acme Inc.        Webinar

Surviving Original        John Doe                               Dreamforce 16

Non-surviving 2           John M. Doe         Tunes Corp.    Free Trial


Surviving Merged        John Doe         Acme Inc.        Free Trial


With most merge logic you are throwing away some data you believe is not as good as the ones you are keeping. In some cases, you want to keep them all. This is common with unstructured data like notes or multi-value categories and segmentation data. For these data fields, use the append logic. Here is the same example above, but instead of keeping only the earliest modified lead source, we want to append lead source.

Non-surviving 1           J. Doe             Webinar

Surviving Original        John Doe       Dreamforce 16

Non-surviving 2           John M. Doe         Free Trial


Surviving Merged        John Doe         Webinar, Dreamforce 16, Free Trial

Base on a Formula

For numerical or binary data fields, it often makes sense to apply a mathematical formula, such as:

  • Pick the maximum or minimum value
  • Calculate a sum or an average value
  • True if only all records are true

Here is the same example above with 2 numerical fields: behavior score and demographic score. The merge logic is to pick the highest demographic score, but sum the behavior score.

Behavior          Demographic

Non-surviving 1           J. Doe             15                    100

Surviving Original        John Doe       10                    10

Non-surviving 2           John M. Doe         50                    50


Surviving Merged        John Doe         75                    100

Do Not Merge

This one is simple, for some data fields you just do not want to merge.

Leave a comment