Forget what you think you know: With the move to cloud, much of what you learned with on-premises databases over the last 20 to 30 years no longer applies. Most things must be relearned—and that's not necessarily a bad thing.
Hybrid clouds are a popular option with one big problem: How do you federate the data across the platforms that make up your cloud? This is just one of the database challenges IT operations and database administrators face as more applications move to hybrid cloud architectures.
Hybrid clouds were once defined simply as a paired private and public cloud. But these days hybrid means a legacy match with one—or several—public cloud providers. These are distributed systems with native services, applications, and data hosted on one cloud or another.
While you might think that centralizing your data would be the best way to go, developers and database professionals host data on the cloud and legacy platforms that provide their databases of choice.
While some of those offerings are classic brands, such as Oracle and SQL Server, a multitude of other, purpose-built databases are available that perform such advanced functions as in-memory processing, binary object storage, MapReduce, and analytics. And while some of these databases run everywhere, most run only in the cloud.
So, as cloud computing and purpose-built databases become more popular, what’s an IT professional to do? Here are the best practices you need to kill it with data management in hybrid IT environments.
Data federation: Use it to keep track of your data
A federated database system maps multiple database systems, which in turn are geographically dispersed into a single, centralized database. As a rule, databases are geographically decentralized, such as the databases that exist on more and more public clouds, or in on-premises systems.
But the bigger challenge with data federation lies in keeping track of your data, including the metadata, and the physical and logical locations of that data. This means understanding the owners, users, types of data, data governance, data security, and so on.
Here are the five things to keep in mind as you address distributed database management problems across hybrid or multi-clouds:
Use an MDM (master data management) approach and tool set
Use a tool that is attached to the distributed database, as well as to any federated databases that are abstracting the physical databases.
Leverage federated databases
This includes using virtual databases that abstract many distributed back-end databases, to provide more logical views of the data, such as a business view, analytics view, raw view, and so on. Doing so extends the life of the physical data stores, because you can leverage the data in different ways without having to change the structure of the databases, and thus deal with application dependencies.
Consider performance
Network latency and having to deal with platform differences that can include integration can result in performance issues that render your databases unusable when accessed from a different platform or cloud, or when leveraged as a federated database.
Security is critical
Because the data is presented in many places, all of those locations must be authenticated. You'll need data-level security, including security services, such as encryption, that your databases can provide. You also need to leverage fine- and course-grained data record security to protect data at the record, grouping/table, or database levels. Finally, consider compliance issues for the data, such as when your databases contain personally identifiable information (PII).
Data redundancy is a huge issue
Because applications, developers, and users are looking for a single version of the truth from the databases no matter the cloud or on-premises system on which that database happens to run, this is a must-do. Enterprises are notorious for having many copies of client, sales, and product data, all in different formats, much of it inconsistent. Data federation is a good tool to use here, but a sound MDM program and tool set also helps.
Consider the data security and compliance complications
Address data security in two places: at the database itself, and in the cloud.
At the database level, native database security services, cloud-based and non-cloud databases vary a great deal in the types of security services offered. But most provide record- or object-level security, where you can allow or disallow access based on who's using the database. Moreover, most databases offer encryption services and integrate with native identity and access management (IAM) systems.
Data-level security is more problematic with distributed data because some databases rely on the native security systems of the clouds where they're hosted. You'll need to leverage those security systems to get access to the native security of the database, and you may end up leveraging two or more native security systems to get data access, which quickly gets complicated.
The cloud is often where your database is hosted. A native cloud database is often deeply integrated with the host cloud security system. But if it’s not native to the cloud platform, such as a bring-your-own-license deal with a major enterprise database player, its security systems will be more autonomous.
Data governance and performance: It's a tradeoff
You need data governance to deal with the production of data that's distributed across different databases on different cloud platforms. The core purpose of data governance on distributed data platforms is to:
Use decentralized policies that govern data in centralized ways
These policies typically deal with the changing of data, or data structure, the use of data, such as time of day, and the amount of data that you can use at any given time. They can be both homogeneous and distributed, meaning the same database might run on different cloud platforms, as well as heterogeneous and distributed systems. That means you're dealing with governance across different databases that run on different cloud platforms.
Interact with the security systems
Ensure that policies that deal with compliance, such as what should be encrypted, are enforced. The use of PII data is also governed, and you need to deal with recovery operations when a database fails or become corrupted.
While data governance is essential, it comes at a cost. The more policies you run, either within the distributed databases on each cloud or within a third-party data governance tool that deals with distributed cloud-based databases, the slower your databases will run.
While this is also an issue with databases that aren't distributed, the distributed nature of hybrid or multi-cloud-based databases makes the performance issues around data governance an even bigger issue. Network latency is usually the culprit here. But tenant management systems can present issues as well, because many users demand I/O resources at the same time, including for cloud-based databases.
Three best practices to avoid database management headaches
So how can you avoid the three hybrid headaches around federation, security and compliance, and governance? Follow these emerging best practices.
Plan ahead
While many IT organizations like to build databases by adding them as needed, that leads to federation and redundancy issues, as well as the other issues discussed above.
When planning databases for hybrid clouds, avoid problems by figuring out which databases you'll use for what purpose, and with which datasets. Your focus should be on avoiding redundancy and complexity, which are the real problems here. Once you have addressed those issues, security, governance, and performance are much easier to handle.
Consider cloud-native databases
Such systems are less problematic. The tradeoff is that you can’t find the same technology on other clouds, which means you're locked in. But cloud providers are much better at dealing with the issues described here, even when you're dealing with databases that are native to other clouds.
Make sure your infrastructure is sound and correctly configured
There are times when enterprises complain about database performance—and blame the database. But most of the time the root cause lies with the underlying network or a poorly configured storage system on the cloud. Databases depend on cloud I/O systems to function correctly, and those are often misconfigured.
Hybrid IT requires a database rethink
Going forward, you can expect to see more issues with databases that operate across hybrid or multi-clouds. These architectures are new, and enterprises are just learning how to make them work, and how to make them play well together.
But the databases are evolving, and there won't be a single platform at play when hosting databases, cloud or not. Security, governance, and operations must also evolve. It will be a few more years before things move in a better direction.
Keep in mind that your hybrid cloud data issues will be unique to your problem domain and database collection. No doubt you'll encounter challenges specific to your configuration. But the problems described above are the overall issues you should expect to face. Follow the best practices above and you'll be well on your way to solving them.
Keep learning
Get up to speed fast with TechBeacon's guide to the modern data warehouse.
Download the Buyer's Guide to Data Warehousing in the Cloud.
Get up to speed on digital transformation with TechBeacon's Guide.
How important is digital transformation to your org? Take our survey and find out how you stand next to the competition.
Thinking of making a change? TechBeacon's Careers Topic Center provides expert advice to prepare you your next career move.