Misconfiguration is the No. 1 cause of cloud-based data breaches, and the No. 1 cause of misconfigurations is human error. And although there are two sides to any cloud services provider/customer relationship, Gartner reports that at least 99% of cloud security failures are the customer's fault.
Why? Because the technologies and best practices that organizations have long implemented to protect endpoint devices and IT systems in the data center don't apply to cloud environments.
By understanding the why, you can adapt your approach and harden your cloud security posture. Here's how.
Different environments, different security challenges
Security has traditionally been focused on the perimeter to guard against attackers searching for vulnerabilities, such as unpatched software applications that give them the opening they need to slip past the security boundary.
Consider the devastating WannaCry ransomware attack in 2017 that simultaneously struck more than 200,000 systems across 150 countries and caused an estimated $4 billion of damage. It was preventable—Microsoft had released patches to fix for the Windows vulnerability almost two months before the attack.
Attackers take a different approach to attacking cloud-based systems. They use automation technologies to continually scan the entire Internet searching for common cloud misconfigurations, such as unrestricted SSH access. This is a tactic that many of the security tools that worked in the data center cannot detect, let alone prevent.
Network monitoring tools that rely on spans or taps to inspect traffic don't, because CSPs don't provide direct network access. Data doesn't always traverse customer-facing TCP/IP networks in cloud environments, and neither do cloud data exfiltration events.
There is no network perimeter in the cloud. AWS alone has introduced hundreds of new kinds of services over the past few years. Even once-familiar things such as networks and firewalls can appear alien to the data center engineer. All require new and different approaches to security. And due to the on-demand and elastic nature of the cloud, there are more infrastructure resources to track and secure.
Enterprise cloud teams may be running dozens of environments across multiple regions and accounts, and each may involve tens of thousands of resources, all individually configured and accessible via APIs. These cloud resources interact with each other and require their own identity and access control (IAM) permissions. Microservice architectures only compound this challenge.
Cloud misconfiguration vulnerabilities are different from the application and operating system vulnerabilities because they continue to appear even after you've fixed them. You likely have controls in place in your development pipeline to ensure that developers don't deploy known application or OS vulnerabilities to production. And once a vulnerability is there, it's generally a solved problem (at least for those specific application and OS vulnerabilities).
It's commonplace to see the same misconfiguration vulnerability appear over and over again. A security group rule allowing for unrestricted SSH access (e.g., 0.0.0.0/0 on Port 22) is just one example of the kind of misconfigurations that occur daily with at-scale cloud environments.
Additionally, because cloud infrastructure is so flexible and elastic and can be changed at will using APIs, we tend to change it. You increase the likelihood that a misconfiguration will appear when you don't have automated controls to guard against it.
It's too simple for bad actors armed with automated search technologies and a long shopping list of cloud environments containing misconfiguration vulnerabilities to discover who owns those environments and go on the attack. Within minutes of being added to the Internet, a new TCP/IP or DNS endpoint has been scanned by malicious actors. A single cloud resource misconfiguration vulnerability puts a veritable target on your organization's back and leaves your data at risk.
Another critical distinction to make between securing the data center versus the cloud relates to making upgrades. It's possible that, when an organization upgrades or replaces legacy endpoint devices or other local IT systems, a forgotten laptop, server, etc., will be left running on the network without receiving security updates. The fix is simple: Shut it down.
Cloud orphans
What's more likely today is forgetting about unused cloud infrastructure. While it's easy to create cloud infrastructure resources, it's more difficult to destroy them completely. Not surprisingly, cloud customers wind up running and pay for far more cloud infrastructure than they use.
While some of those untracked resources may still serve legitimate business uses (which itself is concerning if you aren't tracking them), much of it is "orphaned infrastructure"—idle cloud resources in our environment that serve no business purpose.
Orphaned cloud infrastructure is a cost problem, but few recognize them as a security problem. These costly orphans can transform into dangerous "zombies" that invite malicious actors into your cloud environment, and they've played a key role in recent major cloud breaches.
You should assume the presence of zombie resources in your environment, and make eliminating them a priority. You'll save money while improving your security posture.
Some things never change
As companies migrate more of their legacy IT systems and data stores to the cloud, two things haven't changed. The first is the attackers' objective. They want to find valuable, sensitive data, such as corporate intellectual property or customers' credit card numbers, and steal it.
The second is the feeling of frustration—even helplessness—among security professionals that they are at a constant disadvantage due primarily to being overworked and understaffed thanks to an alarming lack of skilled security professionals. Cybersecurity Ventures reports that there will be 3.5 million unfilled cybersecurity jobs globally by 2021, up from 1 million in 2014.
It's an issue that the coronavirus pandemic has exacerbated. Among the key findings of our April 2020 State of Cloud Security survey:
- 96% of cloud engineering teams are now 100% distributed and working from home in response to the crisis.
- 92% are worried that their organizations are now even more vulnerable to a significant cloud misconfiguration-related data breach, and 28% state that they've already suffered a critical cloud data breach.
- 33% believe cloud misconfigurations will increase, and 43% believe the rate of misconfiguration will stay the same. Only 24% believe cloud misconfigurations will decrease at their organization.
Even as businesses take steps to reopen, business leaders and their employees have realized that working from home does not impact productivity or hurt collaboration. Gallup found that 59% of Americans forced to work from home now prefer to continue to work remotely either full or part time.
This transition likely resulted in new cloud vulnerabilities, and it's better to find and eliminate them before attackers do. You need to examine and update protocols to accommodate new access patterns from employees who previously worked at the office full time.
The onus of addressing the vulnerabilities described above is on the cloud customer, not the cloud vendors.
Shared responsibility model
The shared responsibility model has become a standard operating procedure. It states that cloud providers are responsible for the "security of the cloud" and the customer is responsible for the "security in the cloud." It sounds like splitting hairs, but the difference is significant.
The shared security model of cloud dictates that cloud service providers (CSPs) such as AWS, Azure, and Google Cloud Platform are responsible for the security of the physical infrastructure. While CSPs can educate and alert customers about potential risks, they can't prevent their customers from creating misconfigurations.
Preventing customers from making such errors would severely limit the power and flexibility of the cloud. CSP customers are responsible for the secure use of cloud resources.
The bad guys use automation, and so should you
Once an organization gains an understanding of both its cloud security duties and the fact that traditional security tools and approaches can't help fulfill said obligations, the typical approach is to implement manual processes for identifying and remediating misconfigurations. Again, attackers use automation to detect and exploit misconfigurations, often within minutes. Leveling the playing field requires automated remediation to protect security-critical resources and sensitive data.
There are two methods of automated remediation for cloud misconfiguration. The implementation details of each can vary.
The first uses automated scripts triggered by an alert event to execute a configuration change via the cloud APIs. Typically, these scripts or workflows leverage serverless functions, such as AWS Lambda or Azure Functions, to perform modifications to the environment. Remediation actions change the configuration to a predefined "safe" configuration when triggered. These automation scripts are known as "bots," "cloud bots," or "lambdas."
The second method enforces established infrastructure configuration baselines—"snapshots" of provisioned cloud resource configurations that serve as the target configuration state for remediation events. Baseline enforcement is triggered by an event such as an alert, or when configuration drift is detected by subsequent snapshots.
Remediation actions restore the resource to the established baseline configuration and don't require target configuration states to be predefined. This method is called "configuration baselining" and "baseline enforcement," and it makes resource configurations effectively self-healing in the event of misconfiguration.
Automation scripts are easier to get started with, but they run the risk of destructive changes because target configurations can break the application's business function.
There are also maintenance burdens. A single script designed to remediate a specific AWS Security Group misconfiguration can exceed 300 lines of code or require bespoke customization using a workflow. Rinse and repeat for every misconfiguration event you need to protect against and scale that to multiple environments, regions, and cloud accounts.
Baseline enforcement is more challenging to implement without third-party tooling. It still eliminates the risk of unintended destructive changes because the target configuration state for remediation is the established and provisioned configuration before misconfiguration. Because this method requires no development and maintenance of separate automation scripts, the operational burden is low.
Shift left
Security can be easier in the cloud than in the data center. Because the cloud is fully programmable and can be automated, the security of your cloud is also programmable and can be automated. If you rely too much on manual prevention and remediation of cloud misconfiguration and drift, you're likely too slow to counter these automated threats. Look for ways to automate the process of preventing the deployment of misconfiguration, and the detection and remediation of misconfiguration.
Often, the adoption of security automation faces stiff resistance from application and ops teams that fear the significant risk of business disruption due to sudden application downtime events and other breaking changes. Establishing trust can become the biggest challenge in gaining buy-in for automated remediation. The first step to building that trust is embracing a term common in application developer circles: shift-left.
Just as developers unit test their application code before merging it into the build, they should also implement automated unit security testing of their modules before merging into the stage environment.
This isn't just a process change; it's a cultural one. Creating and enforcing a known-good baseline provides developers with real-time automated feedback through the design and development phases. They avoid interrupts that breed delays and ensure that the production environment meets all security and compliance policies when deployed to the cloud.
Shift everywhere
While shift-left is gaining momentum (and you should be doing it), you really should be thinking in terms of shift-everywhere. Because once you’ve made the shift-left on cloud security and you’re protecting security-critical resources against misconfiguration by enforcing your baseline configurations, you’re benefiting from security and compliance automation across your entire cloud lifecycle.
A shift-everywhere mentality enables you to think about security of the system from a holistic perspective and use the same policy-as-code and automation tools at multiple stages of that lifecycle, rather than using different tools at different stages, which can lead to significant operational inefficiencies and security gaps.
Keep learning
Learn from your SecOps peers with TechBeacon's State of SecOps 2021 Guide. Plus: Download the CyberRes 2021 State of Security Operations.
Get a handle on SecOps tooling with TechBeacon's Guide, which includes the GigaOm Radar for SIEM.
The future is security as code. Find out how DevSecOps gets you there with TechBeacon's Guide. Plus: See the SANS DevSecOps survey report for key insights for practitioners.
Get up to speed on cyber resilience with TechBeacon's Guide. Plus: Take the Cyber Resilience Assessment.
Put it all into action with TechBeacon's Guide to a Modern Security Operations Center.