Video: What is a Columnar Database?

A couple of years ago I was presented to something  called a Columnar database.  It was accompanied by a demo where the presenter queried several billions of rows of data in less than 3 seconds.  I waited patiently for magic ferries to fly out of the server.   But soon after I realized that this solution was simply taking a far more logical approach to acquiring large sums of data.

Today I’m going to share a simple exercise which will help you see the power of using a Columnar database for certain tasks.

To illustrate this exercise  I’m going to use a good old excel spreadsheet, populated with some baseball Statistics.

Let’s imagine that you’re going to play the role of the Traditional Database.  Before I have you play this role though, you need to know that there’s a rule.  You have to read the data from left to right starting from the beginning of each row, as you go from row to row, kind of like reading a book.

So, let’s begin.  Traditional Database, please retrieve for me all of the American League Teams...

OK I’ve asked for two columns of data,  Leagues and Teams.  Here goes,  as you read the data notice that you’re retrieving the League and Team on each pass of every row, and eventually you’ll acquire all the American Leagues and their associated Teams.  Now servers do this very quickly, however what if my request  has several BILLIONS of rows, you can start to see that reading each row just to grab the league and team columns could take a while.

And this is where a columnar database comes in handy.  We’re going to redo the exercise one more time, but instead of giving you Traditional Database rules,  I’m going to give you columnar database rules.  This means you’re going to read the data from top to bottom, and you’re only going to read the  columns I ask you to.

OK, Columnar Database, please retrieve for me all of the American League Teams...

Lets begin, Notice that this method skips all that data that isn’t related to what you’re looking for.  Once we’re done getting that information we start with the  next column, skipping the columns that you don’t need.

But wait a second, how does the computer know which League to assign each Team it’s retrieving.   The way it does this is that the Columnar Databases assigns a number to each row of data, allowing it to quickly pair up the many columns that it retrieves.

You can see how this really comes in handy when you start reading the Teams column.  All it needs to know is which number the American League values ended at, in this case row 258.  Meaning I just need Teams from 1 to 258

Now you probably noticed how the columnar database was repeating American over and over.  This is actually another advantage of a columnar database.  By using the numbering system in columnar databases, algorithms can be used to simplify the retrieval of data.  And you’ll find that each Columnar Database system has highly sophisticated methods of gaining further performance measures.

Knowing what I know now, I realize that the presentation I sat through a few years ago wasn’t reading billions of rows of data.  It was just reading a few long columns of data.  

Columnar Databases can be a very straight forward way of addressing the mounds of data corporations have to wade through. They’re not a replacement for Traditional Databases, but certainly a powerful way of doing highly aggregated analysis.

Intricity specializes in providing leading edge solutions for today’s complex data problems.  I’d encourage you to talk with one of our Specialists.  And I commit to you that we’ll live up to our slogan of simplifying complexity.

 ---Jared Hillam, EIM Practice Director

Thursday, December 8, 2011 - 00:15

Related Items

Articles
Jared Hillam joined David Lyle, Informatica VP of product Strategy, and Mark Smith, Ventana Research CEO, in a roundtable about how: Data Virtualization Must Support Lean Principles & Ensure Reuse for BI Agility.
Events
Intricity is a Gold Sponsor at Informatica World 2012. Come and meet part of the INTRICITY team during the event on May 15-18. ATTEND OUR BREAKOUT SESSION: Tools and Methods to Enable Agile BI. VISIT OUR BOOTH: Qualify to win a pair of Bose Noise Canceling Headphones!
Events
The traditional brick and mortar store has undergone a digital evolution over the last 10 years. This transformation that has created mountains of data. The window of opportunity to act and adapt based on this mountain of data is becoming a shorter and shorter period of time.
News
Intricity was awarded last week with the 2011 US Channel Partner of the Year. The award was given out by Harry Gould, Informatica EVP of Worldwide Alliances, and Paul Hoffman, President of Worldwide Field Operations, at a special partner appreciation lunch hosted by Informatica.
Events
Joe Caserta will be conducting a 5 day training course in Austin Texas covering Master Data Management (MDM) and Agile Data Warehouse & ETL Design. Register for the event at http://www.casertaconcepts.com/training.html