How to convert mixed-text data to a numerical range

How to convert mixed-text data to a numerical range

With the exponential expansion of MarTech and SaaS apps in general, there are now more than a hundred thousand SaaS apps on the market and hundreds of thousands more to come. Today, the average business uses 130 apps, up 18% from the previous year. The purpose of all these technologies spans the range of human possibilities, but they are all united by their ability to produce one thing: data, lots and lots of data. While this avalanche of data is great news for B2B marketers and sales teams that can make sense of it, for the majority that struggle with the mind-numbing variety of formats and standards, the news couldn’t be more stomach-churning.

If you’ve ever tried to wrangle your international or mixed-field data from its source into Marketo or Salesforce, or exceeded your API limits due to a field mismatch, take heart, because we have an automated solution that simplifies the whole process. In the following paragraphs, we’ll take you through the solution for converting mixed-text data to a numerical range available to Openprise RevOps Data Automation (RDA) Cloud users. While the process is specific to the mixed-field problem, the approach is universal to the RDA Cloud, and all in a day’s work for the Data Essentials solution used to clean and standardize our customers’ data.

How Openprise RDA Cloud converts mixed-text fields

In this example, we’ll start with a field with values that should contain numbers but has a mix of text and numbers instead, which is very common with fields that deal with revenue or the number of employees. The goal is to provide a numerical range, so the data must be cleansed and standardized, converted to numbers, and then placed in the ranges.

Let’s start with a Revenue field that has a text format. Data could appear in multiple formats as in the screenshot below, such as with or without `$,` with or without `,` and with words such as `under,` `over,` or `million.` There might also be values like `$10,000,000-50,000,000.`

Screenshot showing different ways to depict dollar amounts in a Revenue field with a text format, which can then be converted to a numerical field using the Openprise RevOps Data Automation Cloud.

Step one

Because the numbers we want to look at have characters such as `$` and `,` the first step is to remove these characters.

  • Using the Search and replace text task template, we replace the `$` and `,` with a blank and replace the word `million` with six zeroes to make it look like a number. This can be done in two actions in the same task.
    • Action 1 removes the `$` and the `,`.

Screenshot showing how the RDA Cloud Search and replace text task template replaces characters.

    • Action 2 replaces `million` with `000000`. One of these has a space before it, so we can replace the space as well as the word.

Screenshot showing how the RDA Cloud Search and replace text task template replaces characters.

  • Now we can compare the data we started with to the data from our outputs:

Screenshot showing how the RDA Cloud displays the output of a search and replace task.

Most of the data now looks like a whole number, without any symbols, but there are still some values that aren’t numbers:

Screenshot of an RDA Cloud table displaying the detailed output of a search and replace task.

Screenshot of an RDA Cloud table displaying the detailed output of a search and replace task.

Step two

To assign these, we can create a data source to use with an Infer value task template:

Training demo screenshot showing how to create a data source to use with the data quality Infer value task template in the RDA Cloud.

Screenshot of a table demonstrating how to infer values using the RDA Cloud.

For values like “$10 million-$50 million,” we chose in this example to take a value in the middle, 25000000, but selecting the value you want to use would be at your discretion. You could also choose something like 50000000 if you wanted the highest possible value, or 10000000 if you wanted the lowest. Similarly, for “Over $10 million” we chose to use just one dollar over that, but the user could select the value that they felt most accurately represented what they know from this field.

  • We can then use a filter to include only records that contain `over,` `,`, `under,` or `-` for this task and then infer the value we want from this data source:

Screenshot showing how to use the RDA Cloud Infer value task to filter records.

  • Now all the records are in the same text format that contains only numerals in the OP Revenue (mils removed) field:

Screenshot of a sample table demonstrating the output of the RDA Cloud Infer value task filters.

Step three

From here, we can use the Change attribute type task template to convert the text numerals to a number.

Remember to use `whole number` as the attribute type when creating the new attribute, because only numbers can be put in numerical ranges; real numbers are unnecessary, because you don’t need to worry about cents here.

Screenshot of the RDA Cloud Change attribute type task that is used to convert text numerals to a number.

Step four

Now that we have numbers, we can assign them to ranges.

  • First, we’ll need a data source that assigns ranges
  • As with the previous step, when setting up the data source, we’ll need to make sure our “Min” and “Max” columns are imported as whole numbers.

Training demo screenshot showing how to create a data source within the RDA Cloud to use with the data quality Assign value to range task template.

Screenshot of table demonstrating the output of the RDA Cloud data quality Assign value to range task template.

  • Now we can assign the ranges using the Assign value to range task template. In the above example, we have both Revenue Range and Revenue Segment in two actions. As you can see by the `123` in the field descriptions, the formats for the “Max” and “Min” values are now numerical. The system gives us the flexibility to create one or both to accommodate any business requirement.

Screenshot showing how to use the RDA Cloud Assign value to range task template to assign the Revenue Range.

Screenshot showing how to use the RDA Cloud Assign value to range task template to assign the Revenue Range and Revenue Segment.

  • Here, we’ll need to make sure to check the output to confirm we’re getting the results we expected.

Screenshot of sample table demonstrating the output of the RDA Cloud Assign value to range task.

Note that you could have a field like 0P Revenue Range to show a number, or like 0P Revenue Segment to show what kind of business that number indicates for your company. If these names or ranges changed in the future, you’d just need to update this table in the data source to update to the new ranges or values.

  • We’ll also need to check and validate to make sure none of the original revenue values are missing a Revenue Range or Revenue Segment.

Screenshot showing sample filter within the RDA Cloud used to verify the data’s completeness.

  • If there is an unmatched value at this point, we’ll need to either adjust the tasks or create a new task to manage any data not already mapping. Here’s what the final process looks like when we’re finished.

Screenshot showing the complete RDA Cloud process for converting a mixed-text data field to the numerical range format.

Those are the four steps to convert a mixed field to a numerical format. As you saw, the no-code solution is simple enough for any non-technical user to follow and complete. The automation of data quality jobs like this is just one capability provided by the RDA Cloud. To learn more about the cloud and how it can help you better orchestrate and operationalize your data across all your funnels, schedule a demo with one of our data consultants today.

Leave a comment