In this video I’m going to assume that you’re familiar with ETL or ELT, and if you’re not, you can catch my video on that topic which I’ll link in the description.
There’s been a massive change over the last 4 years in the data integration market. The forces for that change are more than I can cover in this video, but I’ll surface two that I have observed.
The first force came from the application development world, ushered in by GitHub. See GitHub’s ability to centralize and manage code branches made Continuous Integration and Continuous Deployment (or CICD) available to the masses. Startups that used this approach for their apps, found themselves frustrated when it came to traditional ETL tools. This is because they were severely lacking in support for CICD centric development cycles. So they took it upon themselves to go ahead and circumvent the ETL tools altogether, by working in SQL with a wrapper like Python and Airflow to orchestrate the data. This allowed coding teams to work in the comfort of CICD for deploying a solution. So let’s circle this and put it in the corner for a moment, while we talk about the second force.
The second force came from cloud data warehousing platforms like Snowflake. Traditionally the data warehousing space processed data as it was being moved to the target database. This made sense at the time because databases were slow. But by and by, databases became so optimized that it made more sense to push the data transformation steps into the database for execution. Now, many ETL vendors didn’t push ALL their transformation steps into the database, just the ones they deemed as “better suited.” However, the drive to push the computation of the data down to the database grew as things like Snowflake come to the market. This is because the compute resources were so efficiently allocated and quickly executed.
Now, these two forces are having a major impact on the ETL market today. Vendors that enjoyed a dominant position in the data integration market are in a precarious position. Customers are asking the data transformation process to predominantly be pushed into the database engine. Additionally, CICD development cycles are demanding a very code-centric
approach to managing branches of integration logic. What this means is that the value statement of ETL is changing dramatically.
For example, a few weeks ago I was introduced to a company that had recently conducted an evaluation of several integration methods. He observed, “it seems like what we’re deciding now is whether we want a workflow like an interface over our code. Because ultimately everything is being run by the database anyway.”
This is literally playing out right now, so we don’t know where the dust will settle. Some organizations seem to be hard-wired for using a CICD approach and keeping their integration logic in code. Others are better suited towards leveraging an interface which their less technical users can collaborate from.
These two forces are creating another interesting trend, which is the use of code generators. These are template driven engines which take integration patterns and can automatically generate native SQL and/or ETL from those patterns. What is compelling about these code generators is that they don’t sit in the runtime environment. Meaning the generated code is 100% owned by the organization. So even if the code generator was no longer being used, the generated code would continue to work. This also means that the data integration code becomes highly consistent because it's being generated by the same pattern engine.
I’ve written a brief whitepaper about code generators which you’ll find in the video description. Also if you’re evaluating which data integration approach works best for your organization I recommend you reach out to Intricity and Talk with a Specialist.
Here is the whitepaper referenced: Intricity Code Generators
Here is the link to the ETL vs ELT video we mention:
Here's a link to Talk With an Intricity Specialist: https://www.intricity.com/intricity101/