De-duplication and Merge – Identify Duplicates

 

Purpose

  • Find duplicate records within a single Data Source using specified matching method.
  • From the duplicate records, choose one of the records as the surviving record.
  • Either merge or simply delete the non-surviving records.

 

Tips

  • You can use more than one attribute to determine if records are duplicates. For each attribute, use exact matching or fuzzy matching for duplicate identification.
  • For fuzzy matching, the fuzziness factor can be further fine-tuned via advanced configuration.
  • Identification of the surviving record within a group of duplicates uses an elimination methodology where you can define a set of criteria that are applied in priority order to try to achieve only 1 single surviving record.
  • The merge process is controlled by a default option and can be overridden for individual attributes.  There are 4 different options on how to merge data from non-surviving records into the surviving record.
    • “Fill only if empty from non-surviving records” option will fill in empty attributes of the surviving record using available values from non-surviving records to achieve maximum completeness. The non-surviving records are selected in order of any date attribute and based on your setting of earliest / latest order. Once all the available values are harvested from the first non-surviving record, the next latest non-surviving record will be harvested. This process is repeated until the surviving record has no more empty attributes, or when all the non-surviving records have been harvested.
    • “Always overwrite from non-surviving records” option will overwrite the data in the surviving record from a record within the duplicate group where the record is selected in order of any date attribute and based on the setting of earliest / latest order. Note that the surviving record is also within the duplicate group and your logic may indicate that the surviving record is the selected record and, in this case, no data is overwritten.
    • “Append values from non-surviving records” option will append all the data from non-surviving records into the surviving record using a specified delimiter.
    • “Never merge values from non-surviving records” option will prevent any data merge into the surviving record.
  • Use the “Advanced Configuration” and “Add Exception” button to override the default merge logic on a per attribute level.  Each exception can use different merge logic to achieve even the most complicated merge requirements.
  • You can use the “Manual review of de-duplication result” section to specify a data source to store the de-duplication results and then review it using the “Data -> Review De-dupe” option.
  • The de-duplication task supports multiple actions to perform sequential de-duplication. The results of each action is recorded as well as the final results as well. The results of each action is stored in attributes “Duplicates1”, “Duplicates2”, etc… and “Merged Attributes1”, “Merged Attributes 2”, etc… where the number is the corresponding action in top to bottom order. The final results are stored in the attributes “Duplicates” and “Merged Attributes”.

 

Examples

Find duplicate records where Email Address and Company Name attributes are identical.

  • The surviving record is the one with Lead Source has value and Job Title has value.  If there are multiple records remaining based on those criteria, then use the record with the earliest Created Date.
  • By default, fill in surviving record’s empty attributes from records within the duplicates group using the data from the latest Modified Date record first.
  • However, for contact information attributes Address, City, State, Zip, Country, and Phone Number, always overwrite them with data from records within the duplicates group using the data from the latest Modified Date.
  • Also, for the Notes attribute, always append all the data from the duplicates group into the surviving record.

 

Support Contacts

If you have any additional questions, please feel free to contact us at support@openprisetech.com.

 

Leave a comment