Data Governance, Article, Whitepapers, Blog

Reference Data Wrangling



January 25, 2017


Reference Data Management

So what is reference data? Think of it as data that resides in your organization, but that you don’t own the standards and naming conventions of. For example, imagine one day you decided to lead all your postal codes with your store numbers. What would happen? Well, all the sudden, all your packages with the new zip code wouldn’t reach their destination… this is because you don’t control the standards and naming conventions for postal codes, that is something that the Postal Service masters and you just reference.

Reference data is critical for any organization, as it’s a special subset of master data that is used for classification throughout the entire organization’s applications and databases. Reference data includes the lookup table and code table data that is found in virtually every enterprise application. Some of the simple examples of reference data are postal codes, country codes, currency codes, gender codes and industry codes. Complex reference data originates from multiple applications, derived from transactional data, supplied by external agencies etc.  Reference data is typically defined with a code and a description, and has a set of domain values, that is, a list of allowed values.

Why does Reference Data need to be Managed?

Today, most enterprises manage their critical reference data using spreadsheets or by following manual ad-hoc methods. Within the organization there is no centralized mechanism to manage reference data. Reference data variations and inconsistencies can be a major source of data quality issues within the enterprise and cause business losses through system downtime, incorrect transactions, and incorrect reports. Errors in reference data will affect the quality of master data in each domain, which in turn affects quality in all dependent transactional systems. Also, the same reference data or code may have different values in different applications. For example, a gender code may be ‘M’ for Male and ‘F’ for Female in one application but 1 for Male and 2 for Female in some other application. Manual or custom RDM often lacks change management, audit controls, & granular security/permissions. Mismatches in reference data impact the integrity of BI reports and also raises application integration failures. Several compliance requirements require reference data to be monitored and governed.

Tools to Use

Have you already invested in Informatica’s Master Data Management product to maintain and govern your master data? Beyond using Informatica for master data, did you know you can use it for managing your reference data as well?Tools to Use Customers of MDM can find Reference Data Management as a marketplace item, which can be downloaded for free. Intricity recently implemented RDM at a large insurance company so they can take advantage of their existing MDM setup as well as maintain and govern their reference data. Besides Informatica, there are many other reference data management solutions that exist e.g. Collibra, IBM, Oracle, Teradata etc. The main advantage with Informatica’s RDM accelerator is that it uses their powerful MDM platform where you can define, manage, share, and monitor reference data like master data.

Reference Data Management Implementation Steps

  1. The first step in implementing RDM is to find out the owners of the applications containing reference data and understand what level of governance is needed.  This also helps in avoiding or minimizing local maintenance of reference data.
  2. The next step is to identify the data domain for reference data and validate if those fit in the data model supplied with RDM Accelerator. Reference data models are different than the typical party models; they are more dynamic in nature. Reference data models have two parts; the first part defines the reference data set and second part has the actual code values.
  3. After data model validation, the source data loading is the next task. Source data can be loaded using the data integration platform, or it can be imported using IDD import functionality.
  4. The next important step is to build the governance process or workflows, for example, who will initiate the change and who will certify the reference data.
  5. Finally, it’s the publishing data to consumer applications – communication channel.

Post Implementation 

Once you create a “golden copy” of reference data, it is very critical that you maintain and accommodate ongoing changes so that all downstream systems can leverage it. Reference data is no exception and needs to be seamlessly integrated.  An extensive service layer should be built using Java code or ESB tools. There should be a flexible mechanism to export and transform reference data to be consumed by subscribing applications.


So while you might not own the mastering of reference data, that doesn't mean that reference data doesn’t need governance. Reference Data Management allows you to wrangle the inconsistencies between systems within your organization to ensure that you’re receiving consistent reference data which matches the industry standards.

Intricity Experts Article by Vandana Jain


Related Post

Data Governance: Defining Roles & Responsibilities

What are some of the signs that a company does not have data governance roles and responsibilities defined or needs to revisit them? Here are the signs:

Learn More

Data Compliance with the Right to be Forgotten

Are you data compliant? With GDPR and CCPA legislation, it is crucial that businesses meet the requirements for properly handling their customers' data.

Learn More

ChatGPT & Data Management

Many weeks ago, quite a number of folks asked what was going to happen to Snowflake & Databricks now that ChatGPT is around. Here are our thoughts:

Learn More