Data gets stale. It doesn’t ‘go bad’ as such, it’s usually still consumable much like the way a rotten slice of meatloaf isn’t, but its nutritional value may have long since left the building. Part of the problem with data freshness, timeliness and usability stems from the fact that modern software architectures have to deal with an increasingly more diverse set of different data types.
This dataset diversity means that software application developers and data scientists are having to create more intelligent applications while also supporting cloud-native deployment architectures. This combination is driving demand for greater simplicity and convergence of data platforms to help support a wider range of capabilities, unifying features of databases for data-at-rest and streaming analytics to support data-in-motion.
The rise of data perishability
This notion of data perishability is key for John DesJardins. As chief technology officer at Hazelcast, his firm is known for its memory-first application platform for real-time data-intensive workloads. Talking about the trends driving data lifecycles right now, he points to the disruptive effects caused by data decentralization and changes to embedded analytics capabilities driven by new advances in Machine Learning (ML) applications.
Arguing that the challenge facing software engineers in IT departments today is complex, DesJardins highlights the need to meet an organization’s requirements for converged data platforms and unified databases without introducing complex deployment architectures that are not suited to an enterprise’s cloud environments.
The truth is that embedded analytics and ML are disrupting most industries and areas of modern life. Real-time integrated intelligence is at the forefront of this trend, driving a need for easy access to real-time data while also enabling enrichment with the contextual and situational data demanded for rich ML-driven AI-smart algorithms.
“The wider truth, upshot and corollary of all these developments is that data insights are becoming more perishable, so that action must be taken where data is born. Data has a shelf-life. The rise of mobile ubiquity along with the arrival of 5G and the proliferation of the Internet of Things (IoT) are all accelerating this trend. Processing must now be more scalable while also supporting complex global architectures, as well as processing that interaction at low-latency,” said Hazelcast’s DesJardins.
Data staleness & shelf-life
In the fight against data staleness and its shelf-life, we can say that 5G solves many of the bandwidth and latency challenges and enables greater agility combined with improved data security through end-to-end software-defined networks. Meanwhile, global cloud architectures are allowing applications to be deployed closer to the source.
“However, traditional databases were not designed for these architectures, so we are now seeing newer data platforms emerging to address these challenges. Simplified distributed architectures, multi-datacenter support and unified processing are emerging capabilities among this new class of data platforms. Alongside this we can see that data is becoming decentralized and more diverse. The solution here is to look at data-as-a-product i.e. the decentralized management of data by the parts of the business that own each source application or data domain,” said DesJardins.
What’s happening out there in the engineering development space now is that architectures such as Data Mesh and Delta Lake (no, not data lake the unstructured data repository, but Delta Lake the open source story layer for data lake reliability) are emerging to address the current data proliferation. The problem with these solutions is that they are not designed for real-time workloads and thus cannot enable intelligent applications with integrated machine learning.
Responding to data ‘as it is born’
DesJardins points out that in modern computing environments, delivering real-time applications demands responding dynamically to data ‘as it is born’. This requires real-time connectivity on a global scale, combined with a unified way to describe and access data, again driving unification along with the adoption of the SQL query language as a lingua-franca.
“We see this real-time capability being achieved by the adoption of SQL across major streaming analytics platforms, as well as supported across newer databases and event streaming and messaging platforms. Specialists in real-time, intelligent applications are actively engaged with vendors to drive the standardization of this unified syntax and actively partnering with other data and cloud vendors to drive interoperability,” said the Hazelcast tech guru.
For developers looking to solve these complex challenges, new platforms are helping take this job of unification and simplification to the next level, by supporting these needs within a single runtime that can support data-in-motion and data-at-rest, through a unified SQL engine.
Furthermore, tools that combine in-memory data grids and streaming engines in Java are able to support partitioned and distributed low-latency compute with co-located data at giga-scale.
A farm-to-plate data lifecycle
On the road ahead, software application developers and data scientists have the opportunity to adopt more modern tools that simplify their development processes and deployment topologies.
“Despite all the expectations around real-time applications, unified platforms need not over-complicate deployment architectures and enterprises’ cloud environments,” concluded DesJardins.
Lower code platforms will inevitably help this effort, but the fight against data staleness is real and no amount of additional carbon dioxide meat packaging technology will solve the problem unless we really take a farm-to-plate view of the entire data lifecycle.