5 key considerations for effective deduping
To conclude this blog series on deduplication, to bring it all together, let’s highlight five key take-aways of a data deduplication project.
1. The Hard Parts are the Surviving Logic & Merge
Deduplication is not a simple and trivial task, despite what your solution vendors may say. Vendors like to talk about how comprehensive their duplicate identification algorithm is, but remember that is the easiest of the three parts. The surviving logic and merge process are the way more difficult steps in comparison. Too many deduplication project start with a bang but ends with a whimper because of running into system and process issues when it comes to executing these two later steps.
Make sure you select a vendor and a technology that can actually handle the entire end-to-end process, and not just the identification part. You will need not only a tool, but deep system knowledge for the applications you use.
2. One-Time Project vs. Continuous Process
Deduplication often ends up being one-time projects. This is usually because of the complications and efforts required to handle the surviving logic and merge steps. Unfortunately, when you don’t have the right technology and vendor for the job, these two steps frequently end up being manual, which makes it feasible only as an one-time project. The benefit of making deduplication a continuous process is obvious. Making it a reality is quite doable as well. If you pick the right solution and vendor for the job, and put in the effort into the first bulk deduplication project and properly capture and implement your business logic and process, then you can simply keep the automated process running. However, if you don’t put the necessary effort into making the process automatable, and decide to take the shortcut of relying on manual resolution, then you’re simply putting in just a short term fix and will watch your data deteriorate and must do another large data deduplication project down the road.
3. People > Process > Data > Technology…and in That Order
Duplicate records are caused by gaps and poor alignments between people, process, and technology. To solve duplication problem, you first have to understand the root causes thoroughly, so you can design and implement the appropriate remediation. Otherwise, you will fail to solve the problem, if not worsen it.
4. Stop the Bleeding First
Proper resolution of duplicates may require involvement of teams and change of processes that can take a long time to execute. Before you try to fix the existing bad data, look at what it will take to prevent the problem from getting worse, in other words, “stop the bleeding”. It can often be much easier and quicker to implement, without involving as many stakeholders. This will enable you to produce quick and valuable results to help rally the troops to support the bigger project of cleaning up existing data.
Procrastination and analysis paralysis only lets the duplicate problem build up and become increasingly more expensive and time-consuming to fix.
5. It Involves More Than One System
Most readers of this blog are likely working with CRM and Marketing Automation platforms. These systems are joined at the hip. Data is synchronized at some level and interactions between these systems can be complex. For such integrated systems, deduplication is fundamentally a cross-system problem. You must have stakeholders from both systems buy in and your solution has to address both systems simultaneously, taking into account the constraints from both sides and understand the interactions and side-effects.