Data engines: What's under your hood?
DevOps teams know the drill: Create an environment, prepare the infrastructure, and align the elements for performance. Account for growth, being fully aware that as the application nears production, usage and resource allocation will scale. Perhaps someone even sharded databases so there's wiggle room for expansion. It all works and everyone's happy.
Yet all too often, capacity grows larger than anticipated, and the number of users being supported is greater too. Metadata, those tiny pieces of data about data, grows more than expected, but since most are small objects, not consuming much storage, projected performance and reliability looks good—on the PowerPoint slides presented to management, anyway.
But data and metadata keep on growing, and the team realizes that without intervention, there won't be sufficient system resources for the data itself.
Teams need to examine the data engine, the part of the software stack that sorts and indexes data, especially when they need improved application performance and easier scalability. There’s a lot more under the hood than most realize—and the type of data structure used for metadata may impose limits that are hard to overcome. Here are some key criteria to consider.
What is a data engine?
Even seasoned IT veterans are often surprised to learn which data engine is in use for the layer beneath the application and above the storage—and are largely unaware of tuning opportunities and requirements.
A data engine is an embedded key-value store (KVS) that sorts and indexes data. Some may know it as a storage engine—software that handles basic operations of storage management, most notably to create, read, update, and delete (CRUD) data. Because the layer is going beyond its traditional role as a storage engine, the term data engine is used to describe this wider scope of use cases.
Beyond CRUD: Reads vs. writes
More organizations are leveraging a data engine to execute different on-the-fly activities on live data while in transit. In this kind of implementation, data engines such as RocksDB often handle in-application operations beyond CRUD. For example, the engines manage metadata-intensive workloads and prevent metadata access bottlenecks that may lead to performance issues.
While metadata volumes seemingly consume a small portion of storage relative to the data itself, the need for sub-millisecond performance with metadata means that systems cannot tolerate much of a bottleneck without the impact on the end-user experience becoming uncomfortably evident. This challenge is particularly salient when dealing with modern, metadata-intensive workloads such as IoT and advanced analytics.
The data structures within KVS generally fall into one of two categories: those that are optimized for fast write speed, and those optimized for fast read speed. To store metadata in memory, data engines typically use a log-structured merge (LSM), tree-based KVS. Structures using LSM are tuned for write-intensive applications because they can store data very quickly without needing to make changes to the data structure, thanks to the usage of immutable SST (Sorted String Table) files. In contrast, B-tree data structures have a decided advantage in read-heavy.
Unfortunately, DevOps cannot have its cake and eat it too. While existing KVS data structures can be tuned to be good enough for application write and read speeds, they cannot deliver high performance for both operations at the same time.
The issue can become critical when datasets get large or when metadata volumes grow so large they dwarf the size of the data itself—as is increasingly common. it doesn't take too long before organizations reach a point where they start trading off among performance, capacity, and cost.
The slippery slope of performance tuning
When performance issues arise, teams usually start by re-sharding the data, Before long, the number of datasets multiplies, and developers are devoting more time to partitioning data and distributing it among shards and less on tasks that deliver business value.
Next, teams may attempt database performance tuning, usually by applying default settings that may not suit the applications. Refinement is an iterative and time-consuming process, and even skilled developers may struggle to achieve the perfect balance.
Once tweaked, these instances can experience additional, more serious performance issues if workloads or underlying systems are changed—particularly if applications deviate from the system norm. This can set up an endless loop of further retuning, consuming more developer time.
Just for good measure, teams may throw additional storage resources at the problem. Yet the resulting benefit often proves to be short-lived, and the costs of continually adding more resources can't be sustained longer term.
Architecting for change
New data engines have emerged in answer to the demands of modern applications to stay ahead of data growth. Next-generation data engines can be a key enabler when low-latency, data-intensive workloads require significant scalability and performance, as is common with metadata.
For example, Redis, provider of an in-memory data structure of the same name and a strategic partner with my company, sells a Redis on Flash option for intelligently tiering large datasets. In this way, DevOps teams can leverage the new, nonvolatile memory express-based SSDs instead of more costly DRAM resources.
With these newer data engines, performance and scalability can be significantly improved and made predictable across multiple different applications, or in applications where the access pattern varies over time. Since DevOps teams often don’t know future usage patterns, the flexibility to deliver steady performance in a variety of settings is advantageous.
As DevOps teams push the envelope with low-latency microservices architectures, new options have opened the data engine to innovation—minimizing developer intervention and boosting performance at scale while reducing complexity and costs.