Every Dirty Data Problem is Caused by a Process Problem: Why you Need to Fix the Root Cause
People often ask us at Openprise why we just don’t call our platform “Data Management” like everyone else? Isn’t “Data Orchestration” just fancy marketing jargon to sound differentiated? The answer is no. There are very important distinctions between Data Orchestration and Data Management that can be summed up like this:
- Every dirty data problem is caused by a process problem
- Process problems are the root cause; data problems are just the symptom
- Data Management only treats the symptoms, whereas Data Orchestration cures the root cause AND treats the symptoms
Let’s explain this in more detail.
Every Dirty Data Problem is Caused by a Process Problem
Data doesn’t just appear out of thin air. Data is created by people and machines. When you have dirty data—data that’s bad quality or not in a usable form—the processes associated with the generation, governance, and maintenance of that data are simply inadequate. This is probably best illustrated with a few common examples.
Duplicate records are a common data quality issue, but how do duplicate records get into your database in the first place? Here are some of the common root causes:
- When loading a new list of contacts into the CRM, the loading process doesn’t check whether the contact already exists in the CRM.
- When sales reps create new accounts, they don’t have a clear operating procedure and tools to help them determine whether that data is accurate and the account already exists.
- Duplicate contacts exist across different accounts because the contact is a broker or agent that services multiple accounts.his is known as a legitimate, or intentional, duplicate.
In the above scenarios, the duplicate record problems exist because the processes are missing or are inadequate:
- The list loading process lacks steps to check for duplicate records within the list itself, as well as against the target CRM database.
- The account creation process lacks clear guidance on the authoritative source for accounts is (like Dun & Bradstreet or LinkedIn) and on the unique identifier for the account is (like DUNS Number, domain, or legal entity name). There’s either no process at all, or the process has no enforcement.
- The contact management process lacks clear definition for a “legitimate duplicate” vs. an “illegitimate duplicate” and how to identify and flag the legitimate duplicate records.
Non-Standardized and Non-Segmented Database
Does your database look like this?
- 500+ countries with values like this: “USA,” “U.S.,” “United States,” and “Puerto Rico.”
- 200+ states in the USA, with values like “MA,” “Massachusetts,” “Mass,” and “Puerto Rico.”
- Company employee sizes like this: “569,” “100-1000,” “250-500,” and “Mid-Size.”
- Job titles like this: “Director of Digital Campaigns,” “Dir, Demand Gen,” and “Growth Hacker.”
- Lead sources like this: “Event – Dreamforce 2018,” “Salesforce Dreamforce 2017,” and “2016 Dreamforce San Francisco / Tradeshow.”
The lack of database standardization and segmentation is once again caused by missing or inadequate processes. Ask yourself:
- Is there a data dictionary established for all geographical regions?
- If there is a data dictionary: when the data definition in it changes, is there a process to map the existing values to the new values to keep the database consistent over time?
- Is the data standard enforced via pick-lists and data templates on input? Or is it handled by a background process after data entry?
- Are there clear segmentations defined with mappings of source values to target segments?
- Has customer profiling been done with segmentations established for job levels, job functions, and buyer personas? Who’s responsible for defining and maintaining the segments?
- Is there a lead source and campaign name standard and approval process in place?
Incorrect Lead Routing and Account Assignment
Nothing infuriates a sales team more than not being assigned the right leads and accounts within an acceptable time. Do these problems sound familiar?
- Leads assigned to the wrong rep due to bad data—like duplicate accounts.
- Leads stuck in a queue because of missing data—like postal code for routing.
- Accounts assigned to the wrong rep because the assignment rules are outdated.
- Leads that disappear into a black hole because accounts are still owned by sales reps who no longer work with the company.
- Highly scored Marketing Qualified Leads sent to sales turn out to be junk because the high engagement score was the result of link scanner clickings vs. actual humans clicking links in the emails).
These routing and assignment problems trace back to dirty data—and they’re all the result of missing or broken processes:
- There’s no process to identify, review, and merge duplicate accounts, as well as a lack of business rules to pick the right account in case duplicates exist.
- There’s no data cleansing and enrichment process before the routing process gets triggered.
- There’s no scalable and manageable process to translate organizational changes into system configurations quickly and accurately.
- There are no exception management processes to continuously—or at least periodically—identify lead funnel exceptions and correct them.
- There’s no process to identify bot clicks or other low-quality engagements and factor them into the scoring model.
Fix the Root Cause; Don’t Just Treat the Symptoms
Simply fixing your dirty data with a Data Management tool without addressing the root causes that created the data problems in the first place is like going to your doctor complaining about frequent chest pains and having the doctor tell you to take two Tylenol and go home.
Simply treating the dirty data problem symptoms without getting to the root causes wastes your company’s resources and constrains your ability to scale your business because:
- Bad or inadequate processes usually don’t create dirty data problems in just one place; they create issues in multiple places. Instead of fixing the root cause once, multiple teams throughout the company must clean up the same data problem in multiple places using different tools.
- When you fail to address the root cause of dirty data, the problem simply reappears only to compound on an even larger scale as more data enters the database, or more fields are created to handle exceptions. Whatever remedy you deploy to treat the symptoms will end up repeating the same work over and over, and can be overwhelmed when the problem compounds. This is often the case with manual remediation processes.
- In the short term, treating the symptom may be faster and cheaper compared to addressing the root cause, but it will certainly be more costly in the long run.
- Repeatedly treating the symptoms often adds additional delay to the process, so if speed and throughput is of value, fixing the symptoms can only get you so far.
Data Orchestration Treats the Root Cause of Dirty Data Problems by Automating Data-Driven Processes
Data Management tools simply treat the symptoms of dirty data problems without addressing the root cause. For example, a data scientist may use a data preparation tool to clean up the data she needs to build a set of data for analysis, but the data continues to be bad in the CRM and other systems of records. Not only does she have to do this work over and over again, no one else can benefit from her hard work either.
Data Orchestration focuses on automating data-driven processes—the processes that generate and transform the data in the first place, and processes that permanently remediate and govern the data problems at the source. Here are some examples:
- Automating the list loading process so third-party lead data is cleaned, standardized, segmented, deduplicated, enriched, and verified before it’s loaded into your CRM.
- Continuously and permanently standardizing and segmenting data in your systems of record—including Sales Automation, Marketing Automation, Finance, and Help Desk—so when other systems need to use the data, the data’s already of high quality.
- Continuously syncing different data silos and creating the missing links between data sets and systems, like linking leads to an account, tagging who’s a customer and what products they’ve purchased, and synchronizing a user’s communication preferences across different engagement platforms.
Business is increasingly data-driven. The companies that will lead in the next decade are the ones that leverage data to scale their operations. To scale any data-driven operations, you need to stop treating data symptoms, and fix the processes that create these dirty data problems in the first place, automating those processes as much as possible to ensure consistency and reliability.