The other day I was cleaning out some old CDs and equipment. I found a CD titled Napster. This brought back many memories of the early 2000s. I remember back then that when you had Napster running it would make everything else on the internet super slow. I even remember networks banning its use because it would cause bandwidth problems.
This got me thinking about the quirks of the internet and how much it has changed the way we work with data, particularly within an IT landscape. Just a few years ago, the ideal architecture sat completely behind a firewall and all the IT solutions sat in close proximity. This made bulk data transfers a common practice. But as cloud has superseded the cost efficiency and capabilities of the traditional on-premise architecture we now have to rethink how we move data. This is because we no longer have servers right next to each other. So if we want to move data we need to think about the impact of that data’s footprint on our existing bandwidth needs. Just like the old Napster days, transferring mass quantities of data over the wire can be disruptive.
One way that we deal with this challenge is to continuously stream data rather than send it out in big batches. The difference of streaming is that when data gets created, it immediately is propagated into the cloud rather than collecting large batches of data.
However, streaming data like this is not as simplistic as it sounds. First, being able to pull data off a database in real time can be disruptive to the database’s primary function. Most databases are architected to service applications which people are using. So keeping those databases busy with coughing up data is usually non negotiable for most companies.
Additionally, streaming data means everything is literally happening in real time, so how do you gracefully deal with roadblocks? For example, imagine that the receiving end isn’t responsive, or there are sudden schema changes or some data gets lost en route?
To address these issues the top vendors in streaming data have devised some creative work arounds. First instead of connecting directly to a transaction database they will connect to the database log. This is the place where the database tracks everything that happens. This is really tricky to do because the log is designed to audit the database. However, by using the log we can completely avoid interrupting the production database, and stream the data landing in it.
To deal with hiccups while the data is in flight requires the ability to gracefully buffer data. Buffers are an important part of streaming. The video you’re watching right now is buffering so that you have a smooth viewing experience. If the buffer didn’t exist you’d constantly get fits and starts, because connectivity to the web has regular interruptions. The buffering provided by data streaming vendors is designed to ensure you aren’t losing streaming data while its in transit. So when an outage occurs or when there is a schema change the streaming data can be still held until the conflict is resolved.
Beyond the benefit of being a better fit for the cloud, streaming also provides many business benefits. For example, now you can get more real time analytics from things such as web traffic data and other high volume events.
Intricity often helps organizations adopt data streaming as part of a larger integration landscape. We have a short strategy engagement which we often recommend to customers which helps plan out the architecture and put it into a deployment roadmap. If you would like to take a look at this strategy engagement click here. And of course, you can reach out to Intricity to talk with a specialist at any time.