Videos, Blog, ETL/ELT

How ETL in the Cloud Works

Jared Hillam

Jared Hillam

May 18, 2016


If you’ve seen my videos about ETL then you’re aware of how critical this tool is for managing data. However, one of the big trends over the last few years is to have ETL delivered in the cloud. The question however is, how is it that ETL could work in a cloud-based architecture when the data is so often on-premise? This video shares how this type of architecture works and how organizations are leveraging this new architecture to lighten their ETL footprint.

Admittedly when I heard about data integration in the cloud several years back, the first word that came to mind was “lightweight.” But that impression quickly changed when I learned about how these products were designed. It’s no secret that the location of data is an important aspect of architecting an ETL solution. If the data is on-premise, then the data processing should be on-premise. Likewise, if the data is at an offsite data center the processing should be at the offsite data center. In other words, it doesn’t make a lot of sense to ship raw data just to say you’re in the cloud. And when we do ship data, we should try to minimize the amount of data as much as possible. So Data Integration in the cloud isn’t just some magic place to toss all your data into. Rather it’s an application architecture change. So to explain this, let’s first talk about how traditional ETL products were designed.

Traditionally, ETL tools followed a 3 tier architecture. This means that it was split up into 3 parts. The design interface for the user, the Metadata Repository where all the settings and content live which the user-created, and the processing layer, or the location where the data gets crunched whose definitions are adopted from the metadata repository. All three of these layers were part of a single package and were designed to work within the 4 walls of your organization. To cloud-enable, these platforms in an on-premise scenario, the two functions of the user interface and metadata repository were taken to the cloud. However, the processing engine stayed on-premise. So when the processing engine was supposed to operate, it would receive the appropriate commands and information from the cloud metadata repository and it would run the data movement routine on-premise. This allows the data to live where it natively is rather than requiring all the data to move to the cloud.

When something needs to be run in the cloud, then another engine in the cloud can run that data in the cloud.

So in other words the storage and design of the data movement are hosted by the cloud ETL vendor but the engine that processes the commands can sit in multiple locations. Some vendors have perfected these architecture options to be completely software-driven.

Intricity assists organizations in architecting for these mixed premise deployments. I recommend you reach out to Intricity and talk with a specialist. We can help you architect an environment that makes the most of its Integration Platforms as a Service while balancing your data movement footprint.


Related Post

Ness Digital Engineering Acquires Intricity

Ness Digital Engineering Acquires Intricity - a New York based company specializing in data strategy, governance, modernization, and monetization

Learn More

What is a Partition?

Understanding the concept of database partitioning can be significantly illuminated by the historical context of hard drive defragmentation.

Learn More

The Narrow Case for Data-to-Information Company Acquisitions

The rumors about Salesforce acquiring Informatica bring up some interesting observations from past acquisitions of this nature.

Learn More