Cloud Workload Protection (CWP) Best Practice – Focus on Impact, Not Volume
How to do CWP right to prepare your organization and protect it from the next widespread vulnerability.
A few years ago the infamous Log4J software vulnerability shook the software industry by catching much of the IT security community unprepared. Log4J is used in nearly every modern application, so the flaw adversely affected cloud services of enterprises all over the globe.
Since then some security teams may have used their close-up experience with the Log4J vulnerability to improve their software supply chain security. But many may have not; by some estimates ~30%-40% of Log4J downloads are (shockingly) still of the vulnerable version.
What is the preferred way to prepare for the next vulnerability showstopper - and for vulnerabilities in general? The right mindset about workload protection is critical to your organization’s success and readiness for such threats. Coping with vulnerabilities isn’t a matter of burning through and mitigating all findings; rather, it’s about adopting a sane, manageable approach of correlating vulnerabilities across OS packages, applications, libraries and more, to determine their potential business impact, prioritize and mitigate accordingly.
To improve your resilience to the next widespread vulnerability, it’s important to understand common and optimal workload protection practices. We explore this here and offer a cloud workload protection perspective that we hope can be game-changing for you and your team.
What Is cloud workload protection?
A cloud workload is an application, a service, a capability or simply a specified amount of work that consumes cloud-based resources such as computing or memory power. Cloud workloads include virtual machines, container images, runtime containers and serverless functions.
Figure - The many different “faces” of a cloud workload
Born in the complex mass of infrastructure and policy that make up cloud environments, cloud workloads tend to have flaws and weaknesses that can expose sensitive data and be exploited. The cloud’s constantly-changing nature makes such issues likely to crop up at any time. A key pillar of a cloud security strategy therefore involves protecting workloads from threats that can compromise their integrity and lead to data breaches and security incidents.
Cloud workload protection (CWP) as commonly practiced involves applying a mix of security controls and actions throughout the workload lifecycle. This includes ongoing scanning of workloads to identify vulnerabilities, malware, misconfigurations, suspicious activity, secrets or sensitive data exposure, and other weaknesses. It includes prioritizing the most critical risks, informing stakeholders, and remediating and responding. Compliance is another focus area of CWP common practices; as we explore later, though, compliance can be a bit of a “false friend” in the overall workload security effort.
In a word, workload protection is the process – and a security and compliance best practice – of preserving workload integrity and notifying stakeholders, such as DevSecOps teams and others, of risks requiring mitigation.
So what’s wrong with common approaches to cloud workload protection? We’ll name them:
- An uber-focus on workload compliance and assessing workload risk in a vacuum
- Trying to tackle as many vulnerabilities as possible
- Focusing on the tool and overlooking process
Let’s take a closer look.
The wrong way of thinking about CWP
Many CISOs want a “set and forget” type of tool for protecting cloud workloads, but effective cybersecurity doesn’t work like that. No CWP tool is a silver bullet for providing workloads with total protection. In reality:
- You will never have bulletproof workloads
- There will always be workload-related misconfigurations and security posture issues caused by the interplay of tens of thousands cloud infrastructure components
- Cybersecurity is too complex to compartmentalize management of cloud workload risk
Part of the misconception and misuse around CWP tools is their common use for addressing compliance. Using a CWP solution for compliance is important as you want to continually scan workloads for ISO and other standards violations that can put sensitive data at risk. Regulatory standards also offer good ideas for security best practices for workloads and cloud security posture in general. However if you are using your CWP tool to tick the compliance checkbox you are likely not using it as effectively as you should. More importantly, being cloud compliant doesn’t mean your environment is secure. Your auditors and compliance teams may be satisfied but you may be missing understanding or addressing workload risk that can have serious impact.
And on the topic of severity and impact: Risk severity by itself does not determine a vulnerability’s true impact. Finding a critical Common Vulnerabilities and Exposures (CVE) is not enough to warrant sending teams scurrying to remediate. In the multilayered cloud, the seeming severity of a vulnerability with a critical CVE may be offset by a mitigating policy or permissions set that protects against the likelihood of the risk coming to fruition.
Figure - Critical CVEs do not give a true picture of what your teams should be remediating
The key to effective CWP: Best effort prevention
A CWP tool can potentially reveal tens of thousands of vulnerability findings. You must be able to know which to address first, and how. Correct prioritization enables your organization to apply its CWP efforts and investment to mitigating the vulnerabilities that can cause the greatest harm. The key lies in assessing vulnerabilities from a risk perspective: prioritizing vulnerabilities for remediation based on their correlated true risk.
A preferred way to think of CWP tools is not as providing unequivocal workload protection; rather, as one security component in an ecosystem. CWP tools contribute to this ecosystem and receive correlated intelligence from the other security components, including related to cloud and Kubernetes configurations, identities and their access to resources and even infrastructure as code. An integrated approach to cloud security surfaces vulnerabilities not just in their volume and severity but by what actually needs remediating.
How to prioritize a vulnerability for remediation?
Every vulnerability finding is different, so every vulnerability has its own potential impact. Determining its impact requires assessing a multitude of factors, including the affected resource’s function and risk factors.
Essential to being able to determine vulnerability impact is having and maintaining a full and current inventory that comprises all workloads (applications, services,...), assets (data and other resources), identities and infrastructure in your cloud environment. The inventory needs to be as comprehensive as possible, fueled by continuous discovery of assets, entitlements and configurations. It should include a software bill of materials (SBOM). You will want to enrich the inventory by continually categorizing its components by business function, sensitivity and other relevant labels. This added intelligence plays a huge role in providing the context and correlation needed for accurate prioritization.
This rich inventory enables you to correlate the detected vulnerabilities across OS packages, applications and libraries. To improve the accuracy of your impact analysis, be sure to factor in additional workload characteristics, including:
- Network exposure - Not just public exposure of assets but any network configuration that creates wider exposure such as being publicly available on the internet
- IAM and permission levels - A relationship map of entitlements between resources and identities, to flush out risk related to escalated or excessive permissions
- Security posture of all resources such as identities - Resources with poor security hygiene can inflate the risk load of a workload vulnerability that involves those resources
- Exposure of dependencies – Components that are not within your control but will be affected by a vulnerability
- Third-party exposure - Is your data or network at risk if a third party is exposed to the vulnerability? As discussed in the section below, ensure that communications are in place for the third-party to inform you of an incident.
Understanding the full business impact of a workload vulnerability also requires taking into account the blast radius and business implications should the vulnerability be exploited. Consider, too, the likelihood that that particular vulnerability will occur – and the impact if it does.
Scanning for vulnerabilities and malware
Whether performed by a cloud workload protection tool or manually, vulnerability scanning involves checking for weaknesses or flaws in software or configurations that could be used maliciously to gain access, steal data or cause damage. This includes looking for outdated software versions, unpatched vulnerabilities and weak passwords. It also involves looking for malware such as viruses, Trojans, worms, and ransomware, as well as suspicious files, processes and any network activity that may suggest the presence of malware. Integrated cloud security platforms prioritize and put findings in context.
Your playbook for taking mitigating action
Workload vulnerabilities have been detected and prioritized as having potential for critical business impact. What’s next? Taking mitigating action on detected risk is where the cloud security trifecta of technology, best practice process and people comes together.
Assuming you’re using a tool or platform that prioritizes risk with accuracy, you can trust the solution’s notifications to not be arbitrary; rather, to be a reliable call for action and follow up. You’ll want to mitigate as soon as possible, determine if your organization has already been victimized by a successful exploit of the vulnerability, and manage the event if so.
Remediation and checking for exploitation
Effectively managing workload risk requires remediating high-risk vulnerabilities. It also requires being prepared to determine if a detected vulnerability has been exploited. Your CWP solution will integrate the critical vulnerability findings in your organizational workflows such as Jira, ServiceNow and Slack.
Teams on the receiving end will be able to take investigative and follow-up actions that include:
Step 1. Querying logs. Logs help determine if unusual activity points to an incident having occurred. A CWP solution typically stores cloud provider logs (such as AWS CloudTrail) for activities going back at least six months. Teams can use logs to make smart and predefined queries to research and better understand the vulnerability. They can also conduct a forensic analysis, including of network misconfigurations, and review security alerts.
Step 2. Responding to the scanning results. Stakeholders use the vulnerability management tools and your organizational processes to respond to findings. Today’s tools automate scanning, prioritization of the findings in context, remediation, and even reporting, simplifying efforts. Be sure teams have a clear idea of what – when an incident has transpired – needs to be checked to determine any fallout from exposure.
Step 3. Patching and remediation. Teams need to be able to remediate the vulnerability while applying limited resources to the most critical areas. You’ll want to have a checklist and procedures in place for carrying out patching on a regular basis and, for serious detected vulnerabilities, on demand. The process should clearly define:
- Who signs off on the patching once completed. Bear in mind that patching can affect business function so signoff should involve the relevant business stakeholders.
- Testing procedures, separate for each workload, for determining before and after if the patch as resolved the vulnerability
- Time goals for patching, ideally defined by potential impact. Consider time goals within regular intervals and for on demand including how soon after vulnerability detection the patching must take place
Procedures for incident response
Preparedness also involves being ready for the worst case scenario: knowing how to respond when you’ve been breached or a security incident has occurred. First, let’s clarify the difference between a breach and an incident. A “breach” is a situation in which a threat actor has penetrated your cloud infrastructure; an incident is when data has actually been compromised or stolen. To use a medical analogy, a breach is getting infected with a virus, an incident is when the virus has caused you to be ill.
You mitigate the potential impact of a vulnerability on two fronts: by effectively resolving the vulnerability, as described above, and by acting appropriately and swiftly at the organizational level.
For swift incident response to a workload vulnerability, you will want to:
- Determine which roles should be involved in a large scale vulnerability event, and the key responsibilities of each. Communicate about these roles and responsibilities ahead of time with all relevant stakeholders. Carry out the necessary training to ensure your stakeholders know what actions they need to take and are prepared to take them upon any notice.
- Define a communications plan in the event of an incident. Determine the chain of command for communicating externally and internally on the incident, what level of escalation determines what kind of communication and when to issue a PR and what elements/degree of detail it should include. The plan needs to cover how to effectively communicate with customers, some of whom may be relying on these components as part of offerings to their own customers further downstream and therefore have their own communication challenges and needs.
- Consider instituting a formal, repeating incident response practice program for going through the defined procedures with all relevant individuals.
Backup and disaster recovery
Implementing backup and disaster recovery processes is a crucial component of any effective cloud workload protection strategy. If an incident is too severe to continue work as usual, you want to be able to restore and make cloud workloads available as soon as possible. By implementing data backups and rollbacks, you can get back to a clean environment quickly.
Backups are an effective means for restoring a previous, unaffected version of the workload. Backups are copies stored off site to be unimpacted by an incident and can be scheduled automatically. Rollbacks involve maintaining multiple versions of an application or system so you can easily roll back to a previous version if an incident occurs, enabling restoring of the application to a known good state and avoiding further impact from the incident.
If you are managing your infrastructure as code, leverage IaC as part of your disaster recovery strategy. Use IaC to apply necessary patches and changes to your infrastructure and to test your disaster recovery processes, helping improve resilience and reduce downtime during incidents.
Conclusion
The dynamic, distributed and layered nature of cloud makes workloads forever prone to misconfigurations and security posture issues. Cloud workload protection tools are not effective if compartmentalized – such as used for compliance only. Security practice falls short – and resources wasted – if teams are expected to tackle vulnerabilities without vetted impact. And if the worst happens – you are breached and/or a vulnerability is exploited – you want to be ready.
The best approach to protecting your workloads is to manage and mitigate vulnerabilities from a risk perspective: fix what matters most. The only way to accurately prioritize the potential impact of a workload vulnerability is with a platform that integrates workload protection among other cloud security pillars. When it comes to risks that become realized – data breaches and actual security incidents – cloud security best practice calls for having clearly defined processes and roles in place to respond swiftly.
An event on the scale of Log4J is near-impossible to prevent but CWP done right will put you ahead of the curve.
- Cloud
- Cloud