Hybrid computing in general has a great deal in common with hybrid computing that leverages edge computing. But you need different partitioning approaches for data and processing when you've decided to keep the processing of data as close to the source of your data as possible.
In the case of edge computing, you typically keep the computing platform or device near an IoT device that produces data. In the case of hybrid cloud, you want to keep the data on private or public clouds as close to the source (e.g., application) as possible. In both cases, you gain the advantage of lower latency and better logical partitioning of the data.
Hybrid systems have the objective to partition the data, and the processing, between your public and private cloud instances. (In "How to take on data management in hybrid clouds," I discussed the tools and approaches to use for data management in the hybrid cloud.)
But with the new concept of edge computing—even edge computing paired with IoT-based systems or with public clouds—the use of a hybrid cloud architecture takes on new dynamics. Here's what your team needs to understand about edge computing in hybrid environments, and how to best approach it.
Three approaches to consider
These are the three basic approaches, or architectures, for edge computing that work for hybrid cloud:
- The edge device and the public cloud provider become a new hybrid. This means that no private cloud exists and that the edge device serves as an analog to a private cloud.
- The edge device is the master to the public cloud, meaning that the public cloud is subordinate to the edge device. While this just turns the tables a bit, the best use of this approach is when your applications need to run on the edge as a primary, with the public cloud in a supporting role. Many airlines are using this architecture. Considering the processing that occurs onboard a plane, the edge is more important than storing the data within a public cloud.
- The edge device is paired with a hybrid cloud, meaning a paired public and private cloud. This is typically the edge device just communicating with the "mother ship," where processing and data lives. This approach has the potential to become a bit too complicated.
There are three architectures and solutions patterns to understand. Some are obvious, and some are not-so-obvious approaches. These include the following:
1. Pull data to the edge
Edge computing is all about the data, with some processing done on the edge as well. The idea is to place the data closest to the source, then provide rudimentary processing on that data and return decisions made on that data to the data source. This source is typically co-located with the edge computing device.
For instance, take an application that gathers data from a motorcycle, a motorcycle rider, and the environment (road conditions, weather, traffic, etc.). Here are a few decisions you must make:
- Have the devices on the motorcycle gather the data and, assuming there's no local data storage, the devices must send data over the network to a database on the public or private cloud. No storage and little processing exist on the motorcycle or rider.
- Provide limited data management capabilities, such as data caching, on the motorcycle. When the network connection is lost, the data is held on the motorcycle until it can be uploaded. The local data is deleted as soon as the network connection is restored.
- Gather data on the edge device, not as a primary processing location, but just for data collection and some real-time processing needed to respond directly to the cloud without latency. This is a bit different from the previous approach, in that you keep the data on the edge device rather than upload and purge.
With this last approach, you can do some processing on the edge data as well. Also, keep in mind that you're still dependent upon your private or public cloud for processing the data in more sophisticated ways, such as when using machine learning.
The idea is that the system can work for long periods of time without a network connection, gather data the whole time, and locally process that data. Also keep in mind that the public or private cloud is the best place to do the processing, considering access to cheap CPU and data management platforms.
The last approach is often the best for leveraging data on the edge and manipulating the edge within a hybrid architecture. Let's build on the motorcycle example.
[ Also see: OpenStack: Now for building clouds at the edge ]
Other considerations with data-to-the-edge
The idea is to provide two tiers of information (cloud and edge computing). The edge tier is a small and inexpensive device that mounts on the motorcycle, which uses direct Bluetooth communication to connect with a dozen sensors on the bike, as well as a smartwatch that the rider wears to monitor biotelemetry. Finally, a Lidar-based scanner tracks other moving vehicles near the bike, including ones that are likely to be a threat.
The data the edge device gathers is also responsible for real-time alerting for things such as speed, behavior, and direction of other close vehicles that are likely to put the rider at risk. This alerts the rider about hazardous road conditions and obstacles such as gravel or ice, as well as issues with the motorcycle itself, such as overheated brakes that may take longer to stop, a lean angle that's too aggressive for your current speed, and hundreds of other conditions that will generate alerts to the rider to avoid accidents.
Moreover, the edge device will alert the rider if heart rate, blood pressure, or other vitals exceed a threshold.
Keep in mind that you need the edge device here to deal instantaneously with data such as speed, blood pressure, the truck about to rear-end the rider, and so on. However, it makes sense to transmit the data to a public cloud for deeper processing—for example, the ability to understand emerging patterns that may lead up to an accident, or even bike maintenance issues that could lead to a dangerous situation.
Clearly, those are high-end processes and intensive data processing operations, including the use of predictive analytics and machine learning. They are best done on a public cloud and are often impossible to run on edge devices, which are underpowered by design. The partitioning of data as to purpose is key here and is essential to leveraging edge as part of a hybrid architecture.
2. Leverage abstraction and automation
People who deal with edge-enabled hybrid clouds quickly get to the concepts of abstraction and automation. They're the obvious ways to deal with data that's partitioned between an edge device that serves as a private cloud analog and a public cloud.
The idea is that you can abstract yourself away from the complexity of maintaining one or more databases or data stores on each platform, edge, and cloud, and just focus on what's important. For example, virtual database systems give you the ability to see complex heterogeneous physical databases as abstracted schemas. These structures can leverage physical data using new representations that are better aligned with your applications or your business.
Abstraction is helpful with hybrid edge computing because you can map complex data that exists on either tier (cloud or edge) in any way you need to view the data. While network connectivity can limit some of the capability here, abstraction is a very productive tool, inter-tier or intra-tier (edge or cloud), because you place volatility of the database structure into a single domain.
Automation means you define preprogrammed activities that occur within or between tiers, such as moving data from one tier to another (edge to cloud), or de-duplicating data to save space on the edge device, or any database management activity that needs to be done per time, or done per event.
[ Also see: Cloud-native architectures are reshaping the enterprise ]
3. Consider a separation
When you disconnect an edge-based device from some sort of master processor, be it a public cloud, private cloud, or traditional on-premises systems, some people might think you're nuts. However, there are instances when this is the best approach. For example:
- When the edge device has the ability to handle the necessary data processing, which could include running machine learning and predictive analytics. Edge devices are very capable in terms of mid-to-lightweight data processing capabilities. It may make sense to just leave the data there, then have the edge device share data and communicate peer to peer.
- When there is little data stored anywhere, storing it on the edge device makes sense. The edge device is paid for, while public clouds charge for usage. If the data can exist in either place for processing, perhaps leverage the edge device exclusively, with the cloud acting as a place for off-line storage of the data in your edge device, for business continuity and disaster recovery purposes.
Don't wait: The time to choose is now
As you can see from these approaches, there are sub-approaches you could come up with as well, and perhaps sub-sub-approaches under those. There are, it seems, too many choices.
Hybrid cloud architectures now include edge devices that often require out-of-the-box thinking about how you gather, process, and leverage data that's both in the cloud and not in the cloud. The sheer number of new approaches, technologies, and architectures will only continue to expand the possibilities.
However, as the number of choices in data processing for hybrid clouds expands, things get more complex. These are choices you are making now, or will be making in the near future. Now is the best time to figure this out in your own organization.
Keep learning
Get up to speed fast with TechBeacon's guide to the modern data warehouse.
Download the Buyer's Guide to Data Warehousing in the Cloud.
Get up to speed on digital transformation with TechBeacon's Guide.
How important is digital transformation to your org? Take our survey and find out how you stand next to the competition.
Thinking of making a change? TechBeacon's Careers Topic Center provides expert advice to prepare you your next career move.