Videos, Snowflake, Blog, Truelty

What is touchpoint stitching? - New Video

Jared Hillam

Jared Hillam

November 13, 2023

New Intricity101 video: What is touchpoint stitching?


From $12,000 to $12! Learn how organizations grapple with the complexities of analyzing volatile data with billions of records and the challenges of tying events to individuals or things.

Read the whitepaper on Touchpoint Stitching HERE.

Talk with a Specialist

 
TRANSCRIPT 
 

Hi, I’m Jared Hillam.

Companies spend a lot of money bringing data together. This is especially true for volatile data, which is data whose attributes change frequently and often has high volume record counts. A good example would be clickstream data or transaction data. It’s not uncommon for these data sources to have billions, trillions, or more records. So when organizations try to tell a story with these records they often are attempting to tie the events to a person, a place, or a thing, and this is really challenging and expensive.

First, you have the issue of the person, place, or thing not being very consistent between sessions or transactions. For example, a user may browse as a guest, then later might log into their account, and later still, might use an app on their phone, then they might browse from another device. So if you’re trying to capture the touchpoints it can be quite difficult to do. That doesn’t mean that organizations don’t try. One of our customers' queries that was costing them a lot of money was doing exactly this with their clickstream data. The query itself was costing them $12,000 dollars in compute costs, to run it just once a day. Over the course of a year, this one query was projected to cost them $4,380,000. 

Speeding up this type of query is not something that could come out of a standard query optimizer. It requires the entire approach of the query to be done in a completely different way. One of the challenging parts is the sheer math involved. A brute force matching of records calculation would be:

number of records * (number of records -1)/2

So let's just look at 100 million records

That computes to just a smidge under 5 quadrillion comparisons!

And 100 million isn’t that many records. What happens when you get to Billions of records? So brute force isn’t a viable option. 

Most product companies that stitch records together don’t see record sets in the billions of records. But there’s a reason for that: 

First, their algorithms aren’t designed with such a high scale of comparisons as a base presumption. Because this is commonly a complex data science problem, they approach it with a data science methodology. If you’ve seen my video titled Python vs SQL, this would be the development mentality of a Python-type coder. Whereas in the billions and trillions, cloud databases are far more scalable using set theory, which would be the SQL-type developer. Of course, not the typical approach in SQL but certainly leverages the strength of these cloud databases.

Second, they bill their customers based on the number of records, so no customer would dream of writing a blank check by pointing a high-scale system to their service. So any kind of fix to this would require a solution that doesn’t attach their pricing to the number of records scanned.

Third, they typically require some kind of data sharing to occur in order for their engine to run. This has raised eyebrows for multiple reasons as contracts are notoriously intricate, reflecting the complexity of the services and the sensitivity of the data involved. Given the sensitive nature of the data being handled and the potential ramifications of data breaches or mismanagement, vendors often arm themselves with extensive liability insurance. This not only indicates the perceived risk in the industry, but also adds an overhead cost, which inevitably trickles down to the client in some form.

Now let's go back to our case study. Intricity leveraged its engine for touchpoint stitching to tackle the query that was costing $12,000 per query. At the completion of the deployment, the compute on Snowflake for the complex matching of clickstream data cost all of… drumroll … $12. 

Yes, that’s 12 dollars. This meant that the client could run it many times intra-day. If you’d like to learn more about the engine we’re using, I’ll attach a whitepaper to this video description. And I recommend you talk with an Intricity Specialist about some of the stitching queries that you’re doing in your organization to see if they could be optimized as well.

Related Post

What is a Partition?

Understanding the concept of database partitioning can be significantly illuminated by the historical context of hard drive defragmentation.

Learn More

CI/CD for Data Projects

Despite its prevalence in software development, CI/CD is less common in data projects due to differences in pace and cultural perception. Discover the importance of implementing CI/CD in...

Learn More

Snowflake ProTalk | Next-Level Features: Cortex LLM, ASOF Join, & Hybrid Tables

April 24 | Explore some of Snowflake's newest innovations with Cortex LLM, ASOF Join, & Hybrid Tables for data innovation!

Watch Now