Videos, Snowflake, Blog, Truelty, identity resolution

When data gets creepy

Jared Hillam

Jared Hillam

February 13, 2024

When data gets creepy


Ever wonder how companies manage your data behind the scenes? Uncover how companies collect, enrich, and utilize your data, and the implications for privacy and marketing strategies.

Read the whitepaper on Touchpoint Stitching HERE.

Talk with a Specialist

 
TRANSCRIPT 
 

Hi, I’m Jared Hillam.

There’s a world that does not get discussed very much outside the 4 walls of Corporate America, mostly because it does raise eyebrows and suspicions for good reason.


This is the world of data brokerage. Most discussions about data brokerage are usually news reports with gotcha campaigns, and how much money is being made selling customer data. And… those are all true. But it’s pretty rare that the guts of how data brokerage companies successfully do what they do gets discussed. And that’s what I would like to cover at a high level in this video. 


Now if we rewind the clock about 8-10 years ago, a store that asked for your contact information, pretty much would only get what you gave them, and that worked pretty well for… for you. But for the corporation, there were really really good reasons to know more about you. Not the least of reasons was that you might use different aliases when contacting that corporation. Like using multiple email addresses, phone numbers, nicknames, short names, even multiple physical addresses. When companies brought all this data together, they couldn’t tell if you were you or multiple people with the same name. So that created confusion within organizations, which would cause companies to send 5 mailers to the same house… it was just an expensive mess. Corporations ended up spending millions of dollars implementing internal systems called MDM tools that could keep track of these duplicate records, which itself was a massive undertaking, and usually wasn’t too successful. 


Now this is only the tip of the iceberg as to why corporations wanted more of your data. These companies wanted to know your preferences, where you shopped, how many kids you had, what your income bracket was, what your social network opinions were, and on and on. To satisfy this thirst for data, companies began to emerge that started focusing on housing people’s profiles which had pretty much any data they could get their hands on. So if the person had 4 email addresses they would house all those variations. And you might be asking yourself, “how did they get all that data?” well, the very first origin isn’t all that important, it’s really the continued origin that really matters. See in pretty much every privacy agreement you agree to, corporations gatekeep access to their content by including “partners” as parties that can access the data you’re giving them. This provides the corporations a sanctioned way of sharing your identity with the data brokerage companies they choose to contract with.


Now the data brokerage company has a unique offering. They can offer to the corporation a service which will not only identify all the duplicated contacts they might have in their data, but they can also enrich the corporation's customer data with information that you never originally gave them. So you might not have given your favorite store your facebook profile, but they can purchase it anyway, by using the “throw away” email you gave them. You see, your throw away email is stored along with your real email and your real name and your real home address, and your real cell phone number etc etc etc, so there’s no hiding here, and yeah it’s a little creepy.


The architecture behind this used to be pretty manual, it involved FTPing files back and forth. But today, in the cloud, the architecture is far more sophisticated. The data brokerage companies data, and the corporations data can basically be processed and shared in something called a data cleanroom, which essentially automates the access controls to the data brokerage companies matching data set, and the corporations existing records. 


Now if you asked me if this was being used in more backhanded/nefarious ways… the answer would be yes. Having said that, a vast majority of reasons data brokerage companies are used by corporations is to simply deduplicate customer records. See they don’t want to send you multiple mailers, and there are laws on the books that say if you unsubscribe, the company has to unsubscribe you from every place you live in their data. Companies can’t possible do this without first identifying all the versions of you in their data.


Now there are less creepy ways of doing this kind of deduplication internally, which doesn’t involve giving your customer’s data to a 3rd party. If you’re interested in learning more about how to do that. I’ve linked a whitepaper to the video description. Also, if you’re trying to get an architecture put together that defines how to deal with duplication of data, I recommend you reach out to Intricity to talk with a specialist.

Related Post

What is a Partition?

Understanding the concept of database partitioning can be significantly illuminated by the historical context of hard drive defragmentation.

Learn More

CI/CD for Data Projects

Despite its prevalence in software development, CI/CD is less common in data projects due to differences in pace and cultural perception. Discover the importance of implementing CI/CD in...

Learn More

Snowflake ProTalk | Next-Level Features: Cortex LLM, ASOF Join, & Hybrid Tables

April 24 | Explore some of Snowflake's newest innovations with Cortex LLM, ASOF Join, & Hybrid Tables for data innovation!

Watch Now