Data Governance, Videos, Data Warehousing, Snowflake, Blog

Where should logic live?

Jared Hillam

Jared Hillam

October 3, 2018

 

Raw data, to actionable information. That's ultimately a need in every organization. It’s why we have data governance programs, it's why we spend hours over databases, ETL tools, reports, and analytics. But how you set those processes up really matters. Just like the setup, the assembly line mattered for Ford. Before Ford’s assembly line, every model was a custom product and it made cars incredibly expensive. The same is true with your data. If you don’t have a process in place for orchestrating newly onboarded data, then making accurate decisions will be expensive, and will only feed a few individuals. To scale this, your organization will need to set up an automated assembly line that will turn that data into actionable information.

At each step in your data to the information assembly line, there is a logic that has to be executed to cleanse and confirm the data so that it can be used by the organization. There is no getting around this, that logic MUST live somewhere. Whether it's in people's heads and excel spreadsheets or an automated data integration process. When and where that logic gets executed is critical. Let me give you an example. If you embed your logic in your analytics layer then you will have a very tightly coupled data visualization solution. Additionally, you will be locked into that vendor. There are all sorts of gotchas like this in setting up an architecture, so the question is where do we put that logic?

There are a few principles that I think need to be followed when deciding where that logic lives:

First, minimize reliance on a single vendor. Often vendors will couple up functionality and sell it into a single solution. Sometimes this is a good thing, but often they do it to make their solution more sticky by tightly coupling the logic in one place. There are conveniences in having an all-in-one solution, but such tightly coupled solutions begin to lag innovations in the market. We’ve seen this with the big ERP vendors. Years ago they either acquired the top Business Intelligence vendors or built their own solutions. Today their BI and data platforms are lagging the market and their customers are trapped in a mound of technical debt.

Second, minimize reliance on a single employee. This problem is fairly self-explanatory and is usually a result of either tribalism or growing pains. To fix this problem, logic needs to be captured in integration tools or data catalogs so the business isn’t being put at risk if an employee leaves.

Third, we have to respect the latency requirements of the people receiving the information. Real-time seems to be the knee-jerk reaction to answering that question. However, real-time creates a can of worms which I’ve brought up in a few previous videos. There are times however where real-time is absolutely necessary. In such cases, tight coupling of logic becomes essential as the data has to flow without any persistence steps. So when architecture is being set up, real-time process requirements must be accounted for

Fourth, barring any real-time requirements, you want to loosely couple the critical junctures of logic. For example, if I’m going to deal with data quality I want to address that step before I deal with mastering the data, otherwise, I’ll force the master data management tool to try and correlate records that have dirty data, to begin with thus wasting a lot of computing resources. Additionally, by keeping the critical junctures in logic separated, I can upgrade those junctures without breaking their counterparts. Thus keeping the architecture more future-proof.

Fifth, minimize complexity as data advances to users. Complex integration logic is not something you want to assign to your broad user communities. The closer data comes to turning into information, the less heavy lifting your broad user community should be doing with the data. Think of it from an efficiency perspective. You don’t want individuals conforming for dates, 1000 times collectively, every week. The more you solve for such logic the larger your end community can become.

So I’ve covered 5 principles to follow when deciding where to put your logic. I’d like to hear some principles you have followed with success. Share your ideas in the comment section. And if you’re working out your data-to-information landscape I recommend you reach out to Intricity about our Strategic Roadmap engagement. I’ve included a link in the video description that outlines this engagement. And of course, you can always reach out to Intricity to talk with a specialist.

Related Post

CI/CD for Data Projects

Despite its prevalence in software development, CI/CD is less common in data projects due to differences in pace and cultural perception. Discover the importance of implementing CI/CD in...

Learn More

New Video: Modern center of excellence

Now more than ever, organizations need to stabilize and optimize their primary use cases to manage costs effectively, maximize technology, and foster a culture of innovation and efficiency.

Watch Now

Modern center of excellence

Discover the transformative power of a Center of Excellence in optimizing cloud data operations. Explore the six pillars of a modern CoE and learn how they drive efficiency and innovation...

Learn More