Videos, Snowflake, Blog, Databricks

Intricity101 Video: Python vs SQL

Jared Hillam

Jared Hillam

October 16, 2023

New Intricity101 video: Python vs SQL


Discover the strengths and ideal use cases for both Python and SQL, while gaining insights into the mindsets of purists in each camp. Ultimately, it's essential to understand when to use each language based on your specific data needs.

Dive deeper into the topic in the Python vs SQL whitepaper

Talk with a Specialist

RELATED VIDEO MENTIONED:

 

 
TRANSCRIPT 
 

Hi, I’m Jared Hillam.

Over the last 8 years or so data manipulation lived in 2 camps. One was the camp of data scientists and the other was the camp of data analysts. These parties leveraged very different tools for their respective roles. The Data Scientists were focused on iterative experiments with flexible structures, and the data analysts were focused on aggregated analytics with predictable frameworks. These requirements saw Python become the central tooling for Data Science and SQL remained as the Data Analyst tool of choice. 

But the last 8 years of cloud data stores have completely blurred the lines between these two roles, and with it, the use of SQL and Python. So in this video, I want to identify the use cases for each and the mindset purists of each may have when starting a project.

First, let’s start with SQL. SQL has been the language of relational data for over 50 years. It has outlasted many attempts to oust it as the language of choice because its syntax is almost too simple. I got a chuckle out of this meme on Linkedin a few years back. Indeed SQL can be very powerful, but it might not seem that way at first glance. SQL has very simple declarations, and this accessibility was what made Facebook develop HIVE to help their staff query Hadoop without having to piece apart the complex Java programs. And by doing this their audience of developers could increase dramatically.

The math in SQL is algebraic in nature which for transactions and accounting turns out to be perfect. So a purist in SQL will often be annoyed with having to step through a process for every little motion when working with data the way a Python developer would. They’ll want to capture the entire query event in one go because that is what is typically required for transaction analytics.

Indeed databases often prefer this bulk motion for query events because it allows the query optimizers to work with the end game as a whole. And this makes relational databases very fast.

Now let's talk about Python. Python is a programming language that can be used for anything from file systems to video games. But in the world of data, you really can’t talk about Python without talking about Pandas. Pandas made it possible to gracefully step through data sets. These data sets are called Data Frames. We made a video about Data Frames a couple of months back. By making it possible to take snapshots, Data Scientists were able to express programmatic statistics as a frame of data before going to the next step in the Python program. You can see that the need to express data like this for a statistician would be very high, as the complexity of each step could be vetted before moving on. Additionally, the starting point of the next program could “stand on the shoulders of giants.” Or in this case the previous logic. If all that logic had to be expressed in one statement, it may feel impossible.

So a purist in Python might find it odd that SQL programmers are attempting to do everything at once. They may want to break it up into small segments as they go along.

Now I talked a little about this a few years ago. If you’re doing deterministic queries that use algebra and roll-up transactions for a business, you should be living in SQL. Not only will the query be faster, it will also likely be cheaper. But as you begin traversing into probabilistic iterative programming, Python with data frames is likely a better fit. 

Now I don’t want to make it sound like you couldn’t stretch SQL to do probabilistic queries or Python to do deterministic queries. You certainly can. But the outcome is not ideal. Either the code will be too complex or the compute footprint will be too oversized when they are mismatched.

Cloud data platforms are increasingly being able to blur the boundaries by making these coding languages native to their solutions. So developers are having to get comfortable with either approach. 

We’ve included in the video description a whitepaper that gives some good comparisons between Python and SQL. Also, Intricity’s teams have been using both Python and SQL for more than 2 decades. So if you’re looking at planning your future state architecture, I would recommend talking with an Intricity Specialist on how to map out your teams which connects the skillsets with your organization's data needs.

Related Post

What is a Partition?

Understanding the concept of database partitioning can be significantly illuminated by the historical context of hard drive defragmentation.

Learn More

CI/CD for Data Projects

Despite its prevalence in software development, CI/CD is less common in data projects due to differences in pace and cultural perception. Discover the importance of implementing CI/CD in...

Learn More

Snowflake ProTalk | Next-Level Features: Cortex LLM, ASOF Join, & Hybrid Tables

April 24 | Explore some of Snowflake's newest innovations with Cortex LLM, ASOF Join, & Hybrid Tables for data innovation!

Watch Now