Videos, Data Warehousing, Blog

What is Data Virtualization?

Jared Hillam

Jared Hillam

October 4, 2012

Data Virtualization has changed the process of acquiring data by simplifying the data gathering steps. Come (more…) see how you can create an Agile BI Development process within your current architecture.

Text from Video:

There are a lot of parts and components that go into accurately gathering data.  However, at the heart of any well-crafted solution is the data integration and query logic.  This is the logic that tells the database what data is being requested and how to process it.  Where that logic exists turns out to be a very important topic when all is said and done as you’ll find in this video.  To illustrate this let me share with you an example.

Many years ago I worked for a software company that sought out to fix a common problem found in Operational Reporting.  We developed a product that allowed you to open 1000s of operational reports and edit all of them at once.  Why was this even necessary?  Well, operational reporting generally connects directly to operational databases or application data, so that it has the most up to date information.  However, operational reports allow you to nest the data gathering logic as part of each report template or file.  So in other words the logic is part of each individual report.  Eventually, organizations would find themselves with literally 1000s reports each with their own nested logic.  The drawbacks of having logic spread out like this quickly revealed themselves.  Often the organizations would have a change to the underlying application which changed how the data logic needed to be conducted.  This meant that 100s if not 1000s of reports had to be manually opened, and edited, and saved and republished in order to account for the changes.

This issue still exists today.  However, Business Intelligence vendors over time have devised clever methods of getting around this issue.  Instead of nesting the logic in the report template file, they came up with a virtual modeling layer that allowed organizations to centralize all their logic in one place.  This way the reports generated from this layer would inherit all their logic from a single location.  This layer is called a metadata layer.  So if changes needed to be made, they could be done once in the metadata layer.  However, there was a catch, these metadata layers would only allow you to use the reporting tools provided by that BI vendor.  In other words, if you wanted to get access to that data, you were at the mercy of whatever reporting tool the vendor gave you.  So you better hope they you don’t have more than one Business Intelligence tool.

To address this and other use cases, in the mid 2000’s the Data Integration industry began introducing the concept of Data Federation.  These technologies essentially decoupled this virtual logic layer from the Business Intelligence tools and exposed the access to the data as a connection.  So companies could connect any reporting tool to it and the data would be gathered for me by issuing a standard SQL Query, and that query could be federated against multiple sources of data.    At the time, Data Federation was lauded as the upcoming replacement to a Data Warehouse.  But this misconception was quickly debunked, as the speed of data access was ultimately a function of how fast the data source is able to produce results.  Since a Data Warehouse stores all the history in an optimized model, it still proved to be the most efficient method of delivering Business Intelligence data.  As the industry has matured however, the use cases for Data Federation have also matured.  Rather than considering Data Federation as a replacement to a Data Warehouse, it has become a complementary solution to it.  Modern Data Federation tools are now called Data Virtualization tools.  This is because they are no longer simply for data connectivity.  But, rather they now generally represent a suite of virtual data acquisition tools that include functions such as virtual data profiling, data quality, advanced caching, ETL tool integration and business analyst friendly interfaces.  This means that Data Virtualization can now play the role of agile BI development.

One of the most common complaints in organizations is how long it takes to turn around a request for information.  Often this is because the request requires changes to the underlying data warehouse which will require changes all the way up the data gathering chain.  Edits will need to be made at ETL layer and possibly the model in the data warehouse, as well as the BI metadata layer.  This might not sound like much, until you take into account that there are multiple people managing each layer.  Then to add insult to injury, often the end result is not what the business wanted to begin with.  Because of this slow and inconsistent process, business users often begin establishing “shadow IT” departments and building their own data acquisition logic, in the hopes that they can circumvent this lethargic process.  But then you have to ask yourself, didn’t you invest in data integration so you could eliminate multiple version of the truth?

With Data Virtualization the process changes completely.  It does this by providing Business Analysts rapid access to the raw data, just as they would normally see in their “shadow IT” but in an organized interface.  This doesn’t mean that IT is out of the picture; rather it means that the two can collaborate on the same data integration logic necessary to satisfy the business.  All this is done without any physical data movement, so the number parties involved is kept at a minimum.  When the Business Analyst is satisfied with their data acquisition, they can use that very connection with their BI tools or if they have an advanced Data Virtualization tool they will be able to push that data acquisition logic down to the ETL tool and enrich the data warehouse.

To sum up, Data Virtualization can provide agility to an organizations data integration and Business Intelligence process by:

1.        Enabling Operational Reports to be based on centralized logic

2.        Decoupling the Business Intelligence tool from the data acquisition logic, allowing multiple BI tools to acquire from the same metadata layer

3.        Speeding up the information delivery process by engaging the Business Analysts and decreasing the number of steps required

I invite you to reach out to Intricity and talk with one of our specialists about your current data integration needs.  We can set you up for future success without sacrificing your current most urgent needs.


Related Post

Case Study: Large Scale Entity Resolution Powered By Snowflake

The Client is a leader in transaction data analytics for the Financial Services industry and house analytics for large-scale fortune organizations.

Learn More

Data Sharks #22: Rhonda Fisher from Bell Textron

Join us as we talk data with Rhonda Fisher, the Director of Data & Analytics at Bell Textron.

Watch Now

Video: What is a Data Frame?

Check out our latest video: What is a Data Frame? Discover what a "data frame" is and how it helps data scientists and engineers design data processing programs.

Watch Now