How to Improve Your Incident Response in the Cloud
A look at the security best practices and mindset to adopt to better detect and recover from malicious activity in your cloud infrastructure.
The massive migration of business applications to the cloud over the last few years has caused a rapid rise in the number of cyberattacks — and in the technologies designed to prevent them. But implementing preventive security controls is not enough to eliminate the risk of compromised data; a comprehensive cloud incident response strategy that uses digital forensics is essential for a complete security plan. In the cloud, governance, shared responsibility and visibility are key in implementing an incident response (IR) solution.
What should your teams be aware of when developing incident response strategies in the cloud?
What’s unique about cybersecurity in the cloud
Cloud architecture differs from on-premises infrastructure — cloud architecture is distributed rather than local. Organizations running workloads in the public cloud may have workloads running in various places around the world. This distributed state dictates how cloud infrastructure and compliance is managed. As a dynamic environment with multiple stakeholders, managing the cloud no longer falls to just the IT team; rather, to DevOps, DevSecOps, developers, and other stakeholders responsible for environments in the cloud.
This complex and distributed nature creates security vulnerabilities resulting from misconfigurations of cloud infrastructure and identities. By nature, the cloud involves a great number of digital identities that organizations need to manage, which often leads to permissions management difficulties and excessive permissions being granted.
Such vulnerabilities can be exploited by attackers and lead to data breaches. IBM’s Cost of a Data Breach Report 2022 found that credentials — stolen or compromised — were the leading attack vector in the past two years. In their 2022 Data Breach Investigations Report (DBIR), Verizon found that nearly 70% of malware breaches involve ransomware; in recent ransomware research, Tenable Cloud Security concluded that ransomware is often driven by misconfigured identities, publicly exposed machines, risky third-party identities and risky access keys. An IDSA report revealed that 84% of respondents experienced an identity-related attack in the past year; organizations are also reporting an increase in attacks on their identity providers.
The new way of managing infrastructure in the cloud and the new types of cloud vulnerabilities require organizations to rethink and refresh their security practices. Security methods and processes designed for on-premises servers and operating systems need to be adjusted, reformulated and modified to fit cloud infrastructure and its vulnerabilities. The approach should also be acclimatized to effectively cover incident response in the cloud.
Incident response: On-premises vs. in the cloud
Fast identification and recovery is critical for sensitive data protection. Attackers who penetrate the environment often lay low and wait for an advantageous time to attack. Early identification could significantly reduce the negative impact of a breach, and even prevent breaches altogether.
Yet today’s numbers aren’t promising. According to IBM’s 2022 report, the average time to identify and contain a data breach is 277 (!) days. Identification takes 207 days, and it typically takes another 70 days for the organization to contain it.
What’s the best way for security teams familiar with on-premises incident response to approach incident response in the cloud? The first step is to understand the main differences between the two types of infrastructure when running incident response operations.
Challenges to incident response in the cloud
When performing incident response on-premises, there are many security challenges to address. However, since the company owns the infrastructure, IR teams can extract and analyze any piece of data they need. The data is accessible to them and provides the forensic evidence about the attack incident. In the cloud, getting to the data related to an incident is very different. Here are the main challenges and differences:
1. Information gaps and blindspots
Services in the cloud are API calls rather than based on the company’s proprietary infrastructure. An attack means these APIs are manipulated to gain access to sensitive data. The ability to investigate these attacks is dependent on the public cloud vendor which holds all the relevant data.
Cloud providers attempt to assist by providing useful tools for incident response like AWS CloudTrail. AWS CloudTrail records all authenticated API calls and includes information about who performed the call, the source IP address, the action performed and the resources impacted. However, these activity logs do not necessarily contain all the information needed to accurately detect anomalous activity. This leaves security teams lacking the tools they need for incident response.
2. Monitoring multiple inconsistent components
The logical access nature of the cloud also means that incident response teams are investigating multiple components, some of them ephemeral, like Kubernetes. This number of components and the fact that they are ever-changing makes it much more challenging to monitor and track any changes or access attackers might have made or gained.
3. Partial visibility into the architecture
IR teams often have zero or near-zero visibility into the cloud’s architecture or the cloud’s identities and their permissions. These blindspots and lack of granular and contextual visibility make it impossible to identify relationships, data flows and exposed resources. Imagine the police investigating a burglary without being able to see the internal rooms of the house and the location of the windows, doors and fire escape. That’s the reality that IR teams face in the cloud.
4. New attack vectors: Credentials and misconfigurations
Manually identifying and detecting cloud-related attacks, and especially those that are identity based, is very difficult. With the thousands of digital identities and credentials in cloud environments, and accidental misconfigurations, it is too easy to overlook vulnerabilities that attackers can exploit to access or modify infrastructure. Therefore, IR needs to start with gaining visibility into what the attacker has access to and which actions they took.
Excessive cloud permissions are a security weak spot that attackers seek to exploit
5. Skill set shortage
In addition, a skill set shortage exists among security and DevOps teams for performing incident response in the cloud. The increasingly complex requirements of cloud know-how, including intricate details about each cloud provider, require specific expertise that is not always common knowledge. Even cloud-native companies may not have the most up to date cloud security practices in place.
6. Who owns cloud security?
The skill set challenge is compounded by the involvement of multiple security stakeholders. With DevOps, DevSecOps, SecOps, SOC and many others involved, it is often difficult to determine who owns what, and what organizational structures — and even company politics — may get in the way of cloud security hygiene.
It’s therefore clear that on-premises IR practices cannot be extrapolated to the cloud. Instead, organizations and security departments need to change their mindset about what IR looks like when performed in the cloud. Developing new skills and practices among your teams is an opportunity to take control of the cloud environment and reduce the time it takes to recover from a cloud incident. Instead of months on end, new skills could reduce timelines to days, and even just minutes or hours.
Spotlight: Crypto mining breaches
Let’s look at an example of how attackers exploit cloud misconfigurations and how difficult they are to detect. One focus area of cybercriminals is crypto mining, the process that verifies a new transaction in the blockchain. A misconfigured server can be exploited and modified to run a crypto miner on it. While the attacked entity is charged for the expensive compute, the attacker financially benefits from the cryptocurrency.
TeamTNT is one of the most prominent malicious players in this space. They are an attacker group that scans IP addresses to look for misconfigurations and uses its malware to harvest user access keys. One of TeamTNT’s most large-scale campaigns was to scan for misconfigured and open Redis databases. When found, they installed a crypto miner on those servers.
A certain company became suspicious that they had been breached. During an IR investigation, the organization discovered that their DevOps team had accidentally created a vulnerable Amazon machine image (AMI) by hardening it on a public IP. Since this was part of the template every time they spun up a new machine, they had accidentally provided scaling services for the attacker.
Why wasn’t this breach identified sooner? Traditional IR methods had originally failed the team, as the machine logs gave no indication of this misconfiguration. Only a more in-depth investigation was able to identify and contain the issue.
Getting started with IR in the cloud
Now that we’ve established the very different requirements for performing incident response in the cloud, let’s see what IR, security and DevSecOps professionals can do to ensure comprehensive and effective IR.
Step 1: Start with a cybersecurity strategy
A business cloud strategy requires a cybersecurity cloud strategy, even before diving into the specifics of cloud-driven incident response. To that extent, it’s important that your plan include three elements:
- People - Ensure your security team’s skill set matches your needs. These skills can and should be enhanced through training and complemented with tools that help bridge any existing gaps.
- Processes - Implement well-established practices that help identify and mitigate any vulnerabilities and threats. Automation and shifting security left are key to combining security with business productivity.
- Tools - Leverage platforms that reduce security overhead by taking on tasks that cannot be performed manually due to their scale and complexity or that relieve teams of manual work and free them for more complex tasks.
For example, a cloud security strategy needs to include tools that provide multi-cloud visibility into your cloud architecture and potential attack vectors by building a complete asset inventory and enabling asset management. For each asset, such a tool could identify the potential attack vector, like excessive permissions or misconfigurations, and determine the blast radius.
Then, in real time, the tool would identify anomalous behavior and configuration changes and automatically mitigate any vulnerabilities. Finally, the tool enables investigating incidents by reviewing snapshots of changes from the past. This is especially important given the dynamic nature of cloud environments.
Detecting anomalous behavior and configuration changes in the cloud
Such a plan will help you identify your existing risks, the blast radius and ensure all your resources are put into effect to minimize those risks.
Step 2: Build prevention methods and improve your security posture
Once you have a security action plan in place, the next step to prevent cyberattacks is by enhancing your security posture. After all, the best security incidents are the ones that never happen. Since identity is acknowledged as the new security perimeter in the cloud, your security goal should be to achieve least privilege and simplify permissions management. This will be the backbone of your security posture for preventing attacker access to sensitive data.
Today’s cloud security posture management is based on the premise of assuming breach. Therefore, it’s important to limit permissions as much as possible, even for developers. This would be, for example, by attempting to eliminate the use of static credentials or at least keep their use to a minimum. To limit the blast radius, it’s also critical to manage the permissions and configurations of components, especially those that are publicly exposed.
Finally, it’s important to review and understand the shared responsibility model. Your cloud providers probably don’t control all the security aspects you think they do. In addition, your cloud vendor may be changing configurations due to adding new features or changing default settings. It therefore behooves you to continuously control and automate configuration and permissions monitoring and mitigation. The public cloud vendor is also not responsible for permissions management, making it all the more important, as noted, to properly manage permissions.
Step 3: Set the stage for incident response
The main challenge incident response (IR) teams have in the cloud is understanding data and actions and the cloud level. As we’ve established, log querying and analysis only provides information to a certain extent. Since providing extra data is expensive, many organizations don’t enable extensive data collection, such as management logs, RDS logs or data events. As a result, when teams try to investigate an incident, they lack relevant data.
Ensure you employ technologies that are able to identify and detect risks — and that your team has the skill set for choosing and understanding how to operate such technologies. DevOps and DevSecOps can help SOC teams in this.
Enriched cloud activity logs are an important step in cloud incident response and provide more insight than the basic logs of cloud providers
We also recommend building a forensics data lake. While cloud providers retain data for short periods of time, such as 30, 90 or 100 days, a forensics data lake will enable your organization to be prepared to investigate as well as detect breaches. The data lake will also enable teams to recover more quickly from incidents, since they will have information about the incident and be able to easily execute any predefined procedure developed for dealing with a given type of incident.
Finally, we recommend that you hold IR drills and train stakeholders on their parts should an incident take place. This will help your organization bounce back quickly and prevent a fall out. The drill should involve security teams, developers, DevOps, DevSecOps and management. You could also make it more lifelike by adding external stress factors, such as would-be journalists or the board calling to inquire about the incident.
After the staged event, take the lessons learned and turn them into automated engineering practices. Incorporate optimized policies in your IaC CI/CD pipelines to prevent future risks from recurring.
What’s next for IR teams operating in cloud environments?
Cloud security is a whole new ball game which requires security and IR teams to adapt their strategies accordingly. Start by mapping your cloud infrastructure and understanding the IR strategy differences derived from the cloud. Then, make sure you have all the processes, tools and skills in place to provide your teams with visibility into your cloud infrastructure. Build new threat-resistant processes, train your teams to practice them and incorporate the right tools, so you have all the forensics capabilities you need for IR in case of an incident. Finally, enhance your cloud security posture and eliminate the need for IR as much as you can by incorporating least privilege policies in your IaC.
Related Articles
- Cloud
- Cloud