Generate Normalization Reference

 

Purpose

Generate a reference data source to be used by a normalization task. Analyze any attribute and generate grouping of similar values. The output is a reference Data Source with these attributes:

  • Normalized value
  • Alias values

 

Field Description

  • Generate normalization reference data for this attribute – Select the attribute for which we need to generate reference data.
  • Advanced configuration description: Identification of similar values is based on a combination of these algorithms:
  • Degree of fuzziness – Fuzzy matching based on this parameter (0.1 to 1, where 0.1 is maximum fuzziness)
  • Percentage of leading text that must match – percentage of similarity
  • Ignore if characters less than – there is no similarity check if character length is less than specified in this field.

 

Tips

  • The cleaner the source data, the better the result. When generating company name reference data, use the Company Name Clean Up task to pre-clean the data first.
  • Identification of similar values is based on a combination of these algorithms:
    • Fuzzy matching based on your parameters
    • Values that begin with identical words
    • Over 90% similarity
  • The matching algorithm is not case sensitive.
  • The matching algorithm ignores short words. The threshold is configurable with a default of 3.

 

Examples

  1. Generate reference for the purpose of normalizing company names.
    • Primary value = Toyota
    • Aliases = Toyota motor, Toyota motor sales, Toyota usa, Toyota financial services

 

Support Contacts

If you have any additional questions, please feel free to contact us at support@openprisetech.com.

 

Leave a comment