Business Intelligence, Whitepapers, Blog, Databricks

The BI Shake Up

Jared Hillam

Jared Hillam

April 20, 2016

Great_Wave_off_Kanagawa2-2-450x222

Over the last 15 years there have been 3 titans in the BI space. Business Objects, Cognos, and Microstrategy (and maybe 4 titans if you count OBIEE). These vendors all have booked multi-million dollar deployments that supported thousands of users on a regular basis. (I know because I've had the honor of building consulting engagements for all of them) But why have Enterprise BI investments made by big companies funneled into these three vendors? Why do they garner the lion's share of users inside a corporation, while other more sexy BI vendors like Tableu, Spotfire, and Qlik seem to garner smaller audiences? Now, all three of those more sexy tools have done multi million dollar deals and have supported large clients, but those big deals are definitely not the norm for those companies, whereas the three titan BI vendors regularly support those big deployments.

Some of this could certainly be attributed to legacy. They got in early, when the getting was good. Additionally, the acquisition by titan ERP vendors didn't hurt. Both Business Objects and OBIEE have essentially become the default BI stack for customers with those back ends. But there is a technical reason that those vendors were targeted for acquisition and others weren't. In my humble opinion, that reason was an effective BI metadata layer. That innovation, originally founded by Business Objects and later adopted by Microstrategy, Cognos, and eventually OBIEE, is at the heart of what makes those companies successful.

So what is a BI metadata layer? Think of it as a virtual layer that has business friendly titles for the data which users can click on. Let's call those clickable objects "data objects". When users pick those data objects, the system magically assembles the SQL code necessary to fetch the data on demand. (I'll decompose that magic a little later) The user of the system will likely never even see the SQL code when they run it. All they know is that they got their data.

In Its Infancy

In the early days of this innovation most competing BI vendors like Crystal Reports had the query logic and report design all part of a single workflow. The user would build their SQL query structure then work on formatting their report. At a small scale this worked great. But after a few years, corporations noticed that they had thousands of these reports with each containing nested logic on how the data came together. If per chance the corporation changed a calculation or business rule, they had to deal with the prospect of hand editing thousands of reports. Most of the time, nobody even knew what was in those 1000+ reports. So the prospect of centralizing that query logic into a single metadata layer, then having all the formatted reports adopt that logic, was a far more appealing idea. All the company had to do is edit the metadata layer and SHAZZAM! All the reports had the corrected logic.

When this innovation came out people imagined this virtual layer fetching data directly out of their ERP Systems. But as the BI industry matured, the prospect of designing that amount of complex logic in a virtual layer (that could only be really tested by running the queries) was just too burdensome. Additionally, ERP systems and other applications were not designed to cough up records in bulk like BI tools required. This need to surface the data and get it into a query-able format was the advent of the Data Warehouse back in the early 2000's. Just like "Big Data", and "Prescriptive Analytics", it was the buzzword of its day. And just like today, everybody was an expert in how to build a "Data Warehouse" overnight. 🙂 So as you can imagine it really took time for real expertise to marinate in the market, but eventually, the industry began to see that a well-designed Kimball Data Warehouse was a perfect match for a BI Metadata layer. This was because queries could quite simply be pushed to the data warehouse which could quickly return a result set because it was designed for that purpose.

In Its Maturity

The incredibly broad range of queries that you could perform with a well designed Data Warehouse made it the "go-to" practice for large corporations. This was because companies could focus their energy in "getting it right" so data could be distributed accurately for the masses. And as much as the world of data has changed, the Data Warehouse is STILL the go-to practice for "getting it right" today. As such, this is why you see huge populations of users on the titan BI platforms and other BI vendors tinkering with departmental deployments. The Metadata Layer enables those titan BI vendors to interact directly with the Data Warehouses which most organizations have so religiously assembled and governed.

There's another glaring advantage that metadata layers are providing in today's ecosystem of big data. Since metadata layers don't really house data, they end up handing the heavy lifting to a database (the Data Warehouse) through a SQL query. This allows the data to be processed where it lives, and the BI platform just takes the processed results which is usually a much smaller record set. This capability is a MUST in environments where mass quantities of data are residing on a Data Warehousing Architectures coupled with Data Lakes.

So hopefully you can appreciate why the concept of a BI Metadata layer is so fundamental to the success of the Titan BI companies.

So Where's the Shake Up?

It's in the cloud. The cloud is messing up the whole model for BI. Since the dawn of databases, BI companies have had the luxury of working with local data sets, because the gravity of the data was behind the firewall. However, today the gravity of the data sits outside of the firewall, in the cloud. While it is possible to rig an integration system to pull all the data down inside the firewall, organizations are asking, "Why deal with all the overhead, hardware, and maintenance when most of the data already resides in the cloud?"

So the BI Titans have been making a painful transition. If you want to see what that transition looks like, I suggest you look at my article: Why Born in the Cloud Matters?

The BI companies that have been Born in the Cloud like Looker, Domo, and Birst are starting to shake up the industry. However, only Looker has a BI Metadata layer that customers can control. This is probably why they have gotten so much attention lately by what I call the "laboratory IT" companies. (like: Google, Uber, Snapchat, and Yahoo) It will be interesting to see how the competition shakes out for this new advent in BI, and whether the Titan BI companies can make the messy transition gracefully.

Looker has opted to deliver their metadata layer to developers as a markup language called LookML. This provides some interesting opportunities to make the queries more programmatic in nature. For example, they have a Slack integration that will respond to people asking data questions. This spontaneity in the world of BI metadata is something rather new.

What About the Data Warehouse?

Because the BI Metadata concept is so critically connected to the Data Warehousing story, I couldn't conclude this article without some kind of point being made about where the Data Warehouse is at in this cloud architecture thing. The true Born in the Cloud option here is Snowflake. Yes, I know about Redshift and MS SQL DW in the Cloud. But remember we're talking about Born in the Cloud. (Amazon Redshift is a Postgres database, but I hear there are things brewing there) Anyway, Snowflake was literally built from the ground up for multi-tenancy with a separate storage and compute layer. Coupling Snowflake with Looker seems to be a match made in heaven, and its completely Born in the Cloud. People that are starting from scratch are flocking to that solution.

Leave Me Your Feedback:

I know I've left some honorable mentions out of this article. I'd love to hear some you've thought of.

 

Related Post

The Narrow Case for Data-to-Information Company Acquisitions

The rumors about Salesforce acquiring Informatica bring up some interesting observations from past acquisitions of this nature.

Learn More

CI/CD for Data Projects

Despite its prevalence in software development, CI/CD is less common in data projects due to differences in pace and cultural perception. Discover the importance of implementing CI/CD in...

Learn More

New Video: Modern center of excellence

Now more than ever, organizations need to stabilize and optimize their primary use cases to manage costs effectively, maximize technology, and foster a culture of innovation and efficiency.

Watch Now