Your data isn’t clean. Accepting that reality is one of the first steps in becoming a data-driven company. Even 3rd party providers of so-called “clean data” have data quality problems, so don’t be under the illusion that somehow you’re the exception to the rule. The question is not if you have dirty data, the question is if you’re doing something to address dirty data. The answer to that question is one of the first indicators that management carries the same reverence for their data assets as they do with their physical assets which the data represents.
So I’m going to contrast two tools that company’s use, often in concert, to address dirty data. Those tools are Data Quality Tools and Master Data Management Tools or MDM tools. And we’ll start with Data Quality.
Data Quality Tools usually focus on batch data modifications during data movement. So if I’m moving data from database A to database B, data quality tools can make changes to that data while that data is in flight. This is very comparable to an ETL tool. The difference between Data Quality and ETL is that Data Quality comes prepackaged with common dirty data fixes, like:
- Fixing Zip Codes
- Fixing Phone Numbers to display in a standard format
- Fixing State and County information
- Email Format Parsing and Validation
Data Quality tools will often have hundreds and sometimes thousands of packaged data quality fixes that can be purchased with the tool.
Additionally, Data Quality tools will allow you to conduct Reference Table matching and validation. This basically means that the tool maintains a list of clean records and ways that those records often come in as dirty. For example on the list, it might have the company name GE and pointing to GE are all the bad “non-standard” versions which that record could be. So when a bad record is caught by the Data Quality tool like General Electricity, the tool knows to replace that with GE.
What’s nice about this reference table matching feature is that you can create your own reference table exceptions for your own company. So things like your common product names and variations of how people spell them can be maintained.
And this brings up a good point. The people in charge of maintaining data quality are the Business Users. This is because they are the originators of the data. So, central to any data quality tool is its ability to interface with a Business Analyst. These analyst interfaces should allow business users to see an automated snapshot of data patterns so they can drill into suspect data and suggest fixes. We call this pattern analysis data profiling. And it’s critical for tracking down bad data.
Now Data Quality tools are very powerful for making changes to data while it’s being moved around. But there’s a different type of data quality problem which an MDM tool is very well suited for. And that is a consistency and synchronization problem.
You see, if I were to ask you about one of your customers named Slim Shadey, you might find His profile in your support desk system, your warehouse management system, and your CRM. So which one has the right profile? Who’s the real Slim Shadey? Dealing with this problem is exactly what Master Data Management tools do. They typically do this by maintaining a physical data set which is fed from all those systems. They don’t store the transactions but they do keep track of the customer’s attributes which we call Master Data. And if the Support Desk gets a change in the phone number from the customer, that information will get fed into the Master Data Management tool. It will then decide, based on a myriad of rules, whether that new phone number should be added to Slim Shady’s profile.
The rules engine of an MDM tool is a supercritical component. As you can imagine, there might be a lot of factors in deciding whether a record should get updated. Sometimes source systems aren’t dependable and so they shouldn’t have priority over other systems. Other times the record itself in the MDM repository is so old that an update from any system would be valuable. Most MDM tools have a scoring mechanism for managing this process. As you can imagine the ability to configure this scoring becomes critical. So the best MDM tools allow for very fine-grained configuration of this score and what task comes next should a certain threshold be met.
Another big difference is MDM’s ability to maintain and change hierarchies. This allows organizations to define relationships between attributes like GE’s relationship to GE Healthcare and other child companies. This is particularly useful when you have a highly active hierarchy which downstream processes rely on. By centralizing that logic in the MDM Hub we avoid having to recreate it elsewhere.
Just like Data Quality, the Business User is at the helm of a successful MDM tool deployment. When the MDM tool can’t decide which conflicting records should be kept, it needs a Business Analyst to make the decision. And there is no pixie dust in setting up those automated scoring rules, those are all business decisions that must be made by the Business Stakeholders.
I’ve spent a lot of time talking about customer data, but the top MDM tools are configurable enough to cover a wide range of topics like Supplier, Product, Employee, or whatever data you feel needs consistency.
Organizations often use Data Quality Tools to pump the data into the Master Data Management environment. This allows the MDM tool to focus its efforts on the synchronization and consistency problem and not the low-level data formatting issues.
There is a lot here to take in, I recommend you visit our website and talk with one of our specialists. We can help you get on the path of tackling your data quality and consistency problems head-on.