Videos, Snowflake, Blog

Widgets vs Code

Jared Hillam

Jared Hillam

December 4, 2023

New Intricity101 video: Widgets vs Code


Explore the advantages and disadvantages of using graphical widgets in traditional ETL tools versus coding in ELT. This examination sheds light on aspects such as cost efficiency, performance tuning, and the innovative use of referenceable folders for code organization. Discover the most suitable approach for your organization's data strategy.

Related whitepaper: Is PySpark becoming the new ETL standard? 

Talk with a Specialist

RELATED VIDEO MENTIONED:

 

 
TRANSCRIPT 
 

Hi, I’m Jared Hillam.

One of the common trends that is occurring in the data management space is that ETL is happening in the database. The correct name for this is ELT because the extraction and the loading happens before the transformation.

If you need a refresher about this I recommend reviewing one of our videos titled ETL vs ELT. But I specifically want to zero in on how organizations are addressing data transformation at the database level.

In traditional ETL tools, there is usually a graphical user interface that expresses the transformation of the data in the form of widgets that the developer can drag, drop, and connect to each other. So everything is very visual, and in a way, self-documenting. But with data processing happening in the cloud database anyway, everything ultimately becomes a SQL query. Which then begs the question, do you even need the widgets?

Now I’m not going to solve that question in this video but I do want to speak to the advantages and disadvantages of building your ELT directly in code. 

And let’s start with the disadvantages. As I mentioned before, having widgets for each transformation step provides visual documentation of the transformation flow. Some would argue that this is easier for developers to use because the transformation elements are more accessible technically. These visuals are simply not available in code. So the audience of people that could interpret code vs an ETL flow is smaller. So for example, if you’re onboarding a new employee, the transition of that employee will typically be faster if they’re not having to adapt to coded ELT.

Another disadvantage of coded ELT is that code can be written a million different ways to get the same result. And this can create challenges when inconsistent code is used for the same purpose, and developers are managing the code base for the long term. This variation also means that you could have a lot of poorly tuned code mixed in, and you may not even know it.

So let's speak to the advantages. The obvious advantage is cost. If you’re coding all your ELT then you’re the creator and owner of that code, and nobody can send you a bill to use it.

Creating your own code also allows for endless amounts of tuning. There are code segments that are so well-tuned that no generic tool in the world could match its performance, and performance in the cloud is directly connected to the cost of compute, especially over the long run.

Code also allows for more exotic data transformations to be performed. Even the most mature ETL tools provide some kind of coding widget because they can’t possibly package every scenario into their products.

Now there is a slight twist that I want to add to this picture. For many years now there have been tools on the market that allow for organizations to structure their code into a referenceable folder model. This model allows for reusable coding elements to be leveraged across the organization’s data efforts. So imagine you have a standard way for coding a market segment. This standard method may get called 50 times in your entire code base. Rather than copy and pasting the same logic 50 times, you would reference the code in the model and if you needed to change that code you could edit it there once. This ability removes much of the sting from the downsides of coding. Code can be standardized, and variations can be scrutinized. Resources still have to understand how to code, but the chain of logic is more containable. The one thing that for sure doesn’t go away is the cost of a tool.

So how do you choose the right approach for your organization? Well if you’re incorporating a large body of ETL developers you may need the widgets. However, if your team is very comfortable with code, you can save on the costs of logic management. and if you want a mix of both the flexibility of code and the management of logic this can be done with referenceable folders. If you’d like to learn more about these methods, I’ve included a whitepaper in the video description. Additionally, if you’re working on deploying for your organization, I recommend you reach out to Intricity to talk with a specialist.

Related Post

What is a Partition?

Understanding the concept of database partitioning can be significantly illuminated by the historical context of hard drive defragmentation.

Learn More

CI/CD for Data Projects

Despite its prevalence in software development, CI/CD is less common in data projects due to differences in pace and cultural perception. Discover the importance of implementing CI/CD in...

Learn More

Snowflake ProTalk | Next-Level Features: Cortex LLM, ASOF Join, & Hybrid Tables

April 24 | Explore some of Snowflake's newest innovations with Cortex LLM, ASOF Join, & Hybrid Tables for data innovation!

Watch Now