How ETL in the Cloud Works

If you’ve seen my videos about ETL then you’re aware of how critical this tool is for managing data. However, one of the big trends over the last few years is to have ETL delivered in the cloud. The question however is, how is it that ETL could work in a cloud based architecture when the data is so often on premise? This video shares how this type of architecture works and how organizations are leveraging this new architecture to lighten their ETL footprint.

Admittedly when I heard about data integration in the cloud several years back, the first word that came to mind was “lightweight.” But that impression quickly changed when I learned about how these products were designed. It’s no secret that the location of data is an important aspect to architecting an ETL solution. If the data is on premise, then the data processing should be on premise. Likewise, if the data is at an offsite datacenter the processing should be at the offsite datacenter. In other words, it doesn’t make a lot of sense to ship raw data just to say you’re in the cloud. And when we do ship data, we should try to minimize the amount of data as much as possible. So Data Integration in the cloud isn’t just some magic place to toss all your data into. Rather it’s an application architecture change. So to explain this, let’s first talk about how traditional ETL products were designed.

Traditionally, ETL tools followed a 3 tier architecture. This means that it was split up into 3 parts. The design interface for the user, the Metadata Repository where all the settings and content live which the user created, and the processing layer, or the location where the data gets crunched who’s definitions are adopted from the metadata repository. All three of these layers were part of a single package and were designed to work within the 4 walls of your organization. To cloud-enable these platforms in an on premise scenario, the two functions of the user interface and metadata repository were taken to the cloud. However, the processing engine stayed on premise. So when the processing engine was supposed to operate, it would receive the appropriate commands and information from the cloud metadata repository and it would run the data movement routine on premise. This allows the data to live where it natively is rather than requiring all the data to move to the cloud.

When something needs to be run in the cloud, then another engine in the cloud can run that data in the cloud.

So in other words the storage and design of the data movement are hosted by the cloud ETL vendor but the engine that processes the commands can sit in multiple locations. Some vendors have perfected these architecture options to be completely software driven.

Intricity assists organizations in architecting for these mixed premise deployments. I recommend you reach out to Intricity and talk with a specialist. We can help you architect an environment that makes the most of its Integration Platforms as a Service, while balancing your data movement footprint.