Data compliance refers to the adherence to regulatory requirements and guidelines related to the data processing lifecycle of personal data, from collection, processing, storage, and sharing. With the increased legislation surrounding General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), it is crucial that all businesses meet the requirements set for properly handling the personal data of their customers. Companies that do not comply run the risk of severe fines and legal action; GDPR, CCPA, and data compliance are not optional, a mere suggestion or a simple to-do. 

As of May 25, 2018, the European Union (EU) GDPR regulation came into effect. It aims to protect the privacy of EU citizens by regulating the collection, processing and storage of personal data. Regardless of location, this regulation applies to all organizations that process or store the personal data of EU citizens. The GDPR requires organizations to obtain explicit consent from individuals before collecting their personal data, provide them with the right to access, rectify and delete their data and notify them of any data breaches. 

The California state law, CCPA, came into effect on January 1, 2020. It aims to provide California residents with greater control over their personal information and data by businesses operating in the state. There are specific requirements for certain California businesses, such as those with an annual gross revenue of over $25 million, derive at least 50% of their annual revenue from the sale of California residents and so on. Because of CCPA, California residents have the right to access, delete and opt-out of the sale of their personal data. 

With the responsibility to properly handle customer data, this raises questions: how do companies know if they are GDPR and CCPA compliant? And if not, how do they achieve that standard to not only safeguard their customers but also themselves? 

More simply, how can companies be data compliant and remove customers who decide to be "forgotten" permanently, instantly and completely? 

This task may seem daunting, but thankfully there's a range of measures that organizations can enforce. Here are some key steps that organizations can take to comply with GDPR and CCPA: 

  1. Appoint a Data Protection Officer: Companies subject to GDPR should appoint a Data Protection Officer (DPO) who is responsible for ensuring that the company complies and stays aware of the company's data processing activities. 
  2. Obtain Consent: Companies must obtain explicit consent from individuals before collecting and processing their personal data. Consent must be specific, informed and freely given with the option to withdraw that consent at any time. 
  3. Implement Data Protection Policies: Companies should implement data protection policies that define how personal data is collected, processed, stored and shared. The policies should also define how data breaches are identified, reported and resolved. Additionally, companies should have a policy procedure in place to handle customer requests to their data in a timely and effective manner. 
  4. Provide Privacy Notices: Companies should provide privacy notices to individuals that explain how their personal data is being collected and used. It should be written in clear and simple language and should include an individual's rights under GDPR and CCPA. 
  5. Conduct Data Protection Impact Assessments (DPIAs): DPIAs are assessments that evaluate the impact of data processing activities on the privacy of individuals. Organizations should conduct DPIAs to identify and mitigate risks associated with their data processing activities. 
  6. Employee Training: Company employees should be trained on data protection and privacy regulations to ensure that they are aware of their obligations and responsibilities to the customer. 
  7. Review and Update Data Protection Measures: Companies should regularly review and update their data protection measures to ensure they are effective and comply with GDPR and CCPA. 


In Practice

Now imagine someone unsubscribes from an email generated by a marketing automation tool from the Services division. That tool is one of 100 applications a business uses. Most of them also have a record placeholder for a contact. Some send emails out to people as well. So a few days go by and the person gets another email from the business's Digital Products division and the individual, yet again, unsubscribes. Then the individual receives yet another email from another automated system and another and another. This is partially what the GDPR and CCPA laws sought to address. However, solving this problem isn't a casual button press. These systems all have varied versions of a customer interaction and they all act independently from each other. 


Doing this by Hand

Intricity has seen entire teams dedicated to trying to fix this problem with little success. Teams would centralize all the data into one place and run a series of SQL queries attempting to match records. They might match email addresses which for a single contact might show the following breakdown of 3 contacts: 

Truelty Record ExampleThis person might be the same person or she might be 3 separate people. From the email address, this is all a business would know. So another comparison has to be done on the phone number to tie additional attributes together. As more and more identifying attributes are joined together, businesses start shaking out the inconsistencies. But this can also be useless. For example, there might be 10,000 contacts with the phone number 999-999-9999, which break the matching rules. The exceptions to the rules abound. 

Then there's a computation problem. Yes, it could be possible if businesses only did this against a single record. But if businesses do this against 500 million records, the problem becomes an exponential compute issue with every record compared to every other record. 


Typical Solutions

The most common solution in the market is to have an organization send their customer data over to a 3rd party. The 3rd party has a giant graph database that the records get compared to to identify duplicate records. Then the 3rd party ships the records back with consolidated identities to each duplicate. This interaction will happen with the latency of a roughly 24-hour period. The duration to set up a legal contract between the two parties is often over 6 months (corporations aren't going to casually cough up their customer data to a 3rd party). 

There's also the cost which can often be the most prohibitive part. It's not uncommon for these 3rd parties to charge a per-record fee meaning a typical deal can easily be around $500,000 annually. 


Snowflake-Centric Solution

While they may not know it, Snowflake customers are sitting on a giant processing beast. Snowflake is capable of scaling up and processing a near limitless amount of data. All it needs is the commensurate code to do so. 

Truelty is a code generator that resides within Snowflake deployments which auto-generates the code for processing duplicate identities for both equality and fuzzing matches. By implementing it locally, organizations never have to ship their data outside the "4 walls" of their data warehouse using the processing power of their Snowflake instance. 

As Truelty's generated queries execute, a new ID gets generated which clusters the disparate versions of a customer into a singular ID. 


image1-1The end result is a table which indicates the unique record ID from the source and the Truelty ID which provides a consistent ID for duplicate records. What is powerful about this is that the table can be used across any number of disciplines within the Snowflake instance: data warehousing, analytics, data science, reverse ETL, messaging platforms, CDP tools, etc. 

The other added benefit is that the processing is so highly tuned that the clustering can be run intra-day. 


Using the Output

With the unique records identified, when someone unsubscribes form one system, the user's ID can quickly be cross-referenced with other IDs of that individual. The unsubscribe flags can be conducted manually to start, then scripted into each application system. 

While GDPR and CCPA creates a headache for organizations to manage, most people understand the purpose behind data compliance and the right to be forgotten. Having the tools that can narrow down the impact of an unsubscribe and automate the identification of duplicates makes the process easy to adopt. Intricity can help organizations set up this process by automating duplicate identification and flagging application systems automatically.