Data Lake VS Data Warehouse

I was recently listening to a clinical psychologist talk about the different kinds of people that run organizations. He described two traits, open-mindedness, and orderliness. People that run startups tend to be high in open-mindedness, but people that run  established organizations tend to be high in orderliness. As I was putting together my next video, I realized that this concept also applied to how we set up modern data architectures. To make sense of this, let's first talk about what the traits of open-mindedness and orderliness mean.

Open-mindedness is the trait of being open to new information and the ability to entertain new perspectives. It’s no surprise that CEOs of startups tend to be these types of people. After All they must learn to break the mold and create new value in the marketplace.

Orderliness on the other hand is the trait of promoting the patterns that success is proven upon. This often means shutting out ideas or information that might distract one from that success. Again, it's no surprise that CEOs of large corporations tend to be high in orderliness. They are there to ensure that the organization stays focused on what produces success.

So what does this have to do with modern data architectures? Well the trait of open-mindedness applies directly to the concept of a Data Lake. A Data Lake is a place where new data can enter without any barriers. It's a place where any kind of data can reside. So it's a great source to discover new ideas and experiment with data. But because of this openness to any data it suffers from a lack of meaningful structure. To the larger business audience, the data lake can be a bit of a mess. This is where the scalability traits of orderliness become desirable. The trait of orderliness is directly related to a data warehouse. In Data Warehousing we seek to conform dimensions and measures into querable components which are consistent, governed, and easier for an ever scalable audience to consume.

Now it just turns out that both a data lake and a data warehouse are core components in a modern data architecture. The Data Lake is usually the starting point for onboarding data from around the organization, and the stage from which the Data Warehouse structures its data. Having both enables organizations to drive the entrepreneurial traits of open-mindedness, and scalable traits of orderliness.

Not surprisingly, the BI industry follows the same cadence. There are tools that are purpose built for enabling open-minded discovery against highly unstructured data lakes; then there are tools that are purpose built to scale as an orderly information delivery platform, hand in hand with your data warehouse. While these tools DO compete with one another, they bare little in common outside of some charts and graphs. This is because they are purpose built to address either open-mindedness or orderliness. So to determine which tool is right for your needs you need to answer a few questions:

  1. Am I using this tool for open-minded discovery or orderly information delivery?
  2. Is my BI tool for just a few people or is it for the masses?
  3. Do I need to control the query logic so that users get consistent results?
  4. Am I querying billions of rows of data or just some csv files?
  5. Will my BI deployment have just a few analytics, dashboards, and reports or thousands?

Many organizations go through this process like Goldilocks, experimenting with different tooling until the find what’s just right. I recently wrote a white paper titled “Goldilocks Guide to Enterprise Analytics” which describes the journey many organizations embark on as they grow from being a startup to an established organization. You can read it by clicking here.  And of course you can reach out to Intricity to talk with a specialist about your data to information needs.

Related Pages: