This short video provides a simple explanation of what a Columnar Database is. It also walks through (more…) a simple exercise to outline it's advantages.
Text from Video:
A couple of years ago I was presented to something called a Columnar database. It was accompanied by a demo where the presenter queried several billions of rows of data in less than 3 seconds. I waited patiently for magic ferries to fly out of the server. But soon after I realized that this solution was simply taking a far more logical approach to acquiring large sums of data.
Today I’m going to share a simple exercise which will help you see the power of using a Columnar database for certain tasks.
To illustrate this exercise I’m going to use a good old excel spreadsheet, populated with some baseball Statistics.
Let’s imagine that you’re going to play the role of the Traditional Database. Before I have you play this role though, you need to know that there’s a rule. You have to read the data from left to right starting from the beginning of each row, as you go from row to row, kind of like reading a book.
So, let’s begin. Traditional Database, please retrieve for me all of the American League Teams...
OK I’ve asked for two columns of data, Leagues and Teams. Here goes, as you read the data notice that you’re retrieving the League and Team on each pass of every row, and eventually you’ll acquire all the American Leagues and their associated Teams. Now servers do this very quickly, however what if my request has several BILLIONS of rows, you can start to see that reading each row just to grab the league and team columns could take a while.
And this is where a columnar database comes in handy. We’re going to redo the exercise one more time, but instead of giving you Traditional Database rules, I’m going to give you columnar database rules. This means you’re going to read the data from top to bottom, and you’re only going to read the columns I ask you to.
OK, Columnar Database, please retrieve for me all of the American League Teams...
Lets begin, Notice that this method skips all that data that isn’t related to what you’re looking for. Once we’re done getting that information we start with the next column, skipping the columns that you don’t need.
But wait a second, how does the computer know which League to assign each Team it’s retrieving. The way it does this is that the Columnar Databases assigns a number to each row of data, allowing it to quickly pair up the many columns that it retrieves.
You can see how this really comes in handy when you start reading the Teams column. All it needs to know is which number the American League values ended at, in this case row 258. Meaning I just need Teams from 1 to 258
Now you probably noticed how the columnar database was repeating American over and over. This is actually another advantage of a columnar database. By using the numbering system in columnar databases, algorithms can be used to simplify the retrieval of data. And you’ll find that each Columnar Database system has highly sophisticated methods of gaining further performance measures.
Knowing what I know now, I realize that the presentation I sat through a few years ago wasn’t reading billions of rows of data. It was just reading a few long columns of data.
Columnar Databases can be a very straight forward way of addressing the mounds of data corporations have to wade through. They’re not a replacement for Traditional Databases, but certainly a powerful way of doing highly aggregated analysis.
Intricity specializes in providing leading edge solutions for today’s complex data problems. I’d encourage you to talk with one of our Specialists. And I commit to you that we’ll live up to our slogan of simplifying complexity.
---Jared Hillam, EIM Practice Director