-For a deeper dive, check our our video comparing Hadoop to SQL http://www.youtube.com/watch?v=3Wmdy80QOvw&feature=c4-overview&list=UUrR22MmDd5-cKP2jTVKpBcQ
-Or see (more…) our video outlining critical Hadoop Scalability fundamentals
Text from the Video:
Have you ever wondered how Google does their queries into their mountains of data? Or how facebook is able to quickly deal with such large quantities of information? Well today, we’re going into the wild west of Data Management called Big Data.
Now while you may or may not have heard of Big Data, and other terms like Hadoop or MapReduce, you can be sure that they will be a regular part of your conversations in the coming months and years. This is because 90% of the worlds data was generated in just the last 2 years. Yes you did heard that right, all the data in the world was mostly generated in the last 2 years, and this accelerated trend is going to continue. All this new data is coming from smartphones, social networks, trading platforms, machines, and other sources. Since most of this data is already available, the question is whether we are going to take advantage of it.
In the past, when larger and larger quantities of data needed to be interrogated, businesses would simply write larger and larger checks to their database vendor of choice. However, in the early 2000’s, companies like Google, were running into a wall. Their vast quantities of data were simply too large to pump through a single database bottleneck, and they simply couldn’t write a large enough check to process the data. To address this, their Google Labs team developed an algorithm that allowed for large data calculations to be chopped up into smaller chunks, and mapped to many computers, then when the calculations were done be brought back together to produce the resulting data set. They called this algorithm MapReduce. This algorithm was later used to develop an opensource project called Hadoop which allows applications to run using the MapReduce algorithm.
Now with all these new terms, it’s easy to get lost about what is going on. Simply put, we are processing data in parallel rather than in serial. So why do I call it the wild west of data management? Well, even though the MapReduce algorithm was released 8 years ago, it’s still very reliant on java coding to be successfully implemented. However, the market is rapidly evolving and tools are coming available to help businesses adopt this powerful architecture, without the major learning curve of java code.
So should your business be getting into Hadoop? There are really two ingredients that are driving organizations into investigating Hadoop. One is a lot of data, generally larger than 10 Terabytes. The other is high calculation complexity, like statistical simulations. Any combination of those two ingredients with the need to get results faster and cheaper will drive you’re return on investment.
Over the long run, Hadoop will become part of our day-to-day information architecture. We will start to see Hadoop playing a central role in statistical analysis, ETL processing, and business intelligence. Intricity can help ensure your organization isn’t missing out on critical opportunities to leverage this architecture today. Intricity’s early partnerships in the Hadoop space have molded our capacity to help our customers navigate this new frontier. I recommend that you reach out to Intricity and talk with one of our specialists. We can help you evaluate the opportunities and architect a solution for your Big Data requirements.- Jared Hillam, EIM Practice Director, Intricity LLC