Remember way back in 2008 when Apple launched the Apps Store? Hard to believe that was 14 years ago. Another era of “apps” is coming around the corner and is still gaining its identity. These are Enterprise Data Apps. Getting to this point, like anything else, is a result of layers and layers of innovation. 

So what is an Enterprise Data App? Data landscapes have thousands of use cases. Some people need security analytics; some need identity resolution; some need log analysis; some need data quality standards; some need geospatial functions; some need price functions; some need data opps tracking. The list is as long as there are users in an enterprise. Imagine being able to connect apps for the very function you need and have them run within your existing data landscape. That is an Enterprise Data App.

 

 

The Layers

The layers that made it possible for Enterprise Data Apps to exist have been building over the last decades.

  • March 2006: Amazon released AWS S3, an expandable storage solution
  • April 2006: Hadoop map/reduce computation against unstructured data
  • October 2008: Oracle Exadata released with columnar compression
  • February 2011: AMPLab generated compute studies later leading to Apache SPARK
  • July 2012: Snowflake introduces ACID-compliant compute and storage separation with unlimited compute and storage
  • November 2012: AWS Redshift released a cloud data warehouse service
  • 2013-2015: The great Hadoop/Big Data distraction
  • August 2016: Snowflake introduced Zero Copy Cloning
  • June 2017: Snowflake introduced data sharing

Now, these are just the genesis moments that have gone through a tremendous amount of maturity to get to this point where we are today. These foundational moments along with many others have made Enterprise Data Apps an inevitability. 

In many of these cases, the innovations of other competitors came literally days after. So don’t get caught up on “who’s better” with this list, but rather that the foundational functionality exists.

The features on this list took the data consumption market from one that had very low interactivity to one that was highly mobile, shareable, queryable, and manageable. This change is what makes Enterprise Data Apps even possible.

 

Consulting Gigs

The dream of any consultant is to transition out of their hourly project and into selling a product that is disconnected from their clock. In the data management space, this was simply not possible, except for the exceptionally few software executives that have made a name for themselves. The arena of data belonged to many well-established names in the data sphere like Informatica, SAS, Microsoft, IBM, Colibra, Oracle, and others. So consultants spent their careers beholden to these platforms as their client solutions were hard-wired to singular clients. 

The problem was that none of these platforms could perform the enormous array of needs that organizations required with their data. Organizations themselves would grow irritated with their data experts that couldn’t execute fast enough to deliver all the perspectives the organizations needed.

Each organization’s requirements and landscapes were too bespoke to solve real data problems broadly. Some of the leading data platforms like Informatica wrangle in a tremendous number of functional scenarios and can’t be faulted for not going “deep” on any single scenario. This was the case for pretty much any data management gorilla in the market. 

The big change that has left fertile ground for Enterprise Data Apps is data mobility. Sharing data, sharing code, allocating compute, allocating storage, and “limitless” resources are at the heart of why I believe Enterprise Data Apps will be common. Suddenly consultants that have put their blood, sweat, and tears into a project can consider the deliverables of this project to potentially be of value to other clients and thus turn into a Data App.

 

The Apps

Where are these apps? Let’s take a look.

Hightouch

People consider the data warehouse to be a one-way road, but it turns out that the complex logic that produces those insights are valuable to application users. Hightouch reverses the course of the typical information flow and provides data back to the enterprise applications so they can take advantage of the work that has been done bringing that data together. www.hightouch.com 

Matillion

Traditionally ETL was its own application and data would flow through the application in a transition point from the extraction and the load. However, many years back it was shown how databases could be a much faster operator for executing such complex transformations. Matillion capitalized on these new cloud database technologies to generate all the transformation logic locally to that cloud vendor. The power of this approach is that performance is ridiculously fast and far more efficient. www.matillion.com 

MessageGears

People often think that CDP has to occur the old way by centralizing all the assets in some third-party platform. MessageGears is a customer marketing platform that connects directly to a brand’s Customer 360 data environment as a Snowflake-connected data application. With MessageGears, marketers can access all of the data that lives in their brand’s Snowflake – securely and in real-time – without copying, syncing or mapping. www.messagegears.com 

Panther

The capacity to share cloud data has made localized analytics and alerting of security logs possible. Panther has developed analytics that operate on cloud data warehouses for analyzing log events to bubble up any concerning security holes. By standing on the shoulders of these tech giants, Panther can analyze massive numbers of log sources to determine the analytical patterns that would trigger an alert to their customers. Out of nowhere, they are competing with the likes of Splunk. www.panther.com 

Truelty

When you bring data together from several sources, you are bound to have duplicate customer records. Truelty identifies all these duplicates in your customer data, ensuring you can determine who your unique customers actually are. This is all done without ever sending your data outside your organization’s Snowflake instance, as all the code is delivered locally to your Snowflake instance as a service. www.truelty.com

Many More

Most of the cloud vendors showcase these Enterprise Data App vendors and you can find them on their respective pages:

What makes these apps different is that they are built on top of the new flexibility offered by the sharability, agility, and compute freedom that the cloud brings. So one potential outcome is that instead of single gorilla apps that centralize all the functionality into one service, you could see the data being center stage and the surrounding functional apps being the servicers of logic, enrichment, and collaboration on that data. While this future doesn’t bode well for behemoth legacy companies, it does bode well for clients that want their data at the center of their universe.

 

The Legal Hurdle

One of the last hurdles to make this explosion really happen is a legal one. The sheer quantity of legalese that goes into a cloud data service is over the top and that is after answering more than 400 questions about data security policies and procedures. Not exactly the first thing a startup wants to spend its time milling over.

If there is one thing that the industry could work on it is a set of universal agreement templates that parties could recognize regardless of what cloud vendor they want to use.

 

Code as a Service

One interesting take that many of these organizations take is to circumvent this legal hurdle by using a Code as a Service model. Most technology startups consider their code as part of “their estate,” thus they require an agreement to transfer client data to the startup’s “compute instance”. However, the client data landscapes now have all the horsepower they need to locally run complex queries, so running the code within the client's own dataset is perfectly above board. 

Matillion, for example, does this by expressing all the mappings and workflows as a SQL query set which gets pushed into the clients’ Snowflake instance. Matillion never has to “see” the data, but rather just has a control plane. 

Truelty has an application that generates code to process customer identity without ever having to see the data in a similar fashion. This is in stark contrast to alternative identity resolution solutions that require possession of the customer's records to do such processing. Impressively, the code generator dynamically sizes each stage of the comparison process in a flurry of Snowflake compute and storage for a process that looks exactly like a machine learning network being traversed in batch.