Data Governance, Article, Whitepapers, Blog, Databricks

Reference Data Wrangling

Intricity

Intricity

January 25, 2017

HerdQuit-450x222

Reference Data Management

So what is reference data? Think of it as data that resides in your organization, but that you don’t own the standards and naming conventions of. For example, imagine one day you decided to lead all your postal codes with your store numbers. What would happen? Well, all the sudden, all your packages with the new zip code wouldn’t reach their destination… this is because you don’t control the standards and naming conventions for postal codes, that is something that the Postal Service masters and you just reference.

Reference data is critical for any organization, as it’s a special subset of master data that is used for classification throughout the entire organization’s applications and databases. Reference data includes the lookup table and code table data that is found in virtually every enterprise application. Some of the simple examples of reference data are postal codes, country codes, currency codes, gender codes and industry codes. Complex reference data originates from multiple applications, derived from transactional data, supplied by external agencies etc.  Reference data is typically defined with a code and a description, and has a set of domain values, that is, a list of allowed values.

Why does Reference Data need to be Managed?

Today, most enterprises manage their critical reference data using spreadsheets or by following manual ad-hoc methods. Within the organization there is no centralized mechanism to manage reference data. Reference data variations and inconsistencies can be a major source of data quality issues within the enterprise and cause business losses through system downtime, incorrect transactions, and incorrect reports. Errors in reference data will affect the quality of master data in each domain, which in turn affects quality in all dependent transactional systems. Also, the same reference data or code may have different values in different applications. For example, a gender code may be ‘M’ for Male and ‘F’ for Female in one application but 1 for Male and 2 for Female in some other application. Manual or custom RDM often lacks change management, audit controls, & granular security/permissions. Mismatches in reference data impact the integrity of BI reports and also raises application integration failures. Several compliance requirements require reference data to be monitored and governed.

Tools to Use

Have you already invested in Informatica’s Master Data Management product to maintain and govern your master data? Beyond using Informatica for master data, did you know you can use it for managing your reference data as well?Tools to Use Customers of MDM can find Reference Data Management as a marketplace item, which can be downloaded for free. Intricity recently implemented RDM at a large insurance company so they can take advantage of their existing MDM setup as well as maintain and govern their reference data. Besides Informatica, there are many other reference data management solutions that exist e.g. Collibra, IBM, Oracle, Teradata etc. The main advantage with Informatica’s RDM accelerator is that it uses their powerful MDM platform where you can define, manage, share, and monitor reference data like master data.

Reference Data Management Implementation Steps

  1. The first step in implementing RDM is to find out the owners of the applications containing reference data and understand what level of governance is needed.  This also helps in avoiding or minimizing local maintenance of reference data.
  2. The next step is to identify the data domain for reference data and validate if those fit in the data model supplied with RDM Accelerator. Reference data models are different than the typical party models; they are more dynamic in nature. Reference data models have two parts; the first part defines the reference data set and second part has the actual code values.
  3. After data model validation, the source data loading is the next task. Source data can be loaded using the data integration platform, or it can be imported using IDD import functionality.
  4. The next important step is to build the governance process or workflows, for example, who will initiate the change and who will certify the reference data.
  5. Finally, it’s the publishing data to consumer applications – communication channel.

Post Implementation 

Once you create a “golden copy” of reference data, it is very critical that you maintain and accommodate ongoing changes so that all downstream systems can leverage it. Reference data is no exception and needs to be seamlessly integrated.  An extensive service layer should be built using Java code or ESB tools. There should be a flexible mechanism to export and transform reference data to be consumed by subscribing applications.

Conclusion

So while you might not own the mastering of reference data, that doesn't mean that reference data doesn’t need governance. Reference Data Management allows you to wrangle the inconsistencies between systems within your organization to ensure that you’re receiving consistent reference data which matches the industry standards.

Intricity Experts Article by Vandana Jain

 

Related Post

New Video: Modern center of excellence

Now more than ever, organizations need to stabilize and optimize their primary use cases to manage costs effectively, maximize technology, and foster a culture of innovation and efficiency.

Watch Now

Modern center of excellence

Discover the transformative power of a Center of Excellence in optimizing cloud data operations. Explore the six pillars of a modern CoE and learn how they drive efficiency and innovation...

Learn More

Is PySpark becoming the ETL standard?

Discover why PySpark is emerging as the go-to platform for ETL modernization. Explore its open-source adoption, compatibility with major cloud vendors, and the advantages it offers in terms...

Learn More