When AT&T Incident Response Consultants first engage a client during a ransomware incident, the situation is often very chaotic. The client's ability to conduct business has stopped; critical services are not online, and its reputation is being damaged. Usually, this is the first time a client has suffered an outage of such magnitude. Employees may wrongly fear that a previous action is a direct cause of the incident and the resulting consequences. This fear can propagate amongst the team, impacting their ability to communicate knowledge and expertise and leading to inefficient recovery efforts.
As trusted advisors, AT&T has a responsibility to educate our clients on the stress of these moments. The situation requires the efforts of your team on multiple complex tasks.
- Rebuild/Recover critical applications and services
- Communicate with key stakeholders (external and internal) on the status of the restoration of services
- Conduct a forensic investigation in parallel with rebuild/recovery efforts
- Implement near term security controls to bring the operating environment to an acceptable level of risk
In this article, we highlight our insights into the primary access vectors seen in ransomware attacks investigated by AT&T. We also provide recommendations on how to configure your systems to be proactive in collecting data to help protect systems before an attack, and to support forensic investigations if breached.
One of the questions always asked while rebuilding from a ransomware breach is,
"Are the threat actors still in my network?"
This is usually the moment when a paradigm shift happens with the client’s security and IT staff. Until this point, the underlying assumption was the network and its assets were protected from attacks, or the level of risk was considered acceptable.
But breaches have a way of sliding the scale of acceptable risk to a lower level. Usually, its because the breach is tangible; you see its effects and can measure its impact. The impact could be several hundred thousand dollars or more in ransom, an immediate stop to all revenue-generating business processes, and several long work days restoring services.
The long-term implications are discovering the root cause of the attack and implementing adequate controls to help prevent future attacks. The root cause analysis often shows several additional vulnerabilities besides the one that granted attackers access, leading to the larger revelation that previous security controls were not as effective as initially believed. The paradigm shift is complete.
"I am not as protected as I thought."
This leads back to the question, "are the threat actors still in my network?" The answer is, "it depends." It depends on several factors. How long have the threat actors been in the network? What are the available data sources for forensic investigators? Did they install tools and software that were inadvertently copied as part of a backup process? Does the root cause analysis identify the attack vector used to gain access? The answer to these questions are needed to continue to bring systems and applications back online and to operate under a reasonable belief that they are protected. Otherwise, you must accept a higher level of risk and worry about the threat actors return.
There are two primary access vectors for ransomware attacks, phishing and patch management. These vectors have not changed since attackers figured out that encrypting someone else's data can lead to massive profits. However, only focusing on strong controls in these two areas is no guarantee of success. For one, cybersecurity is mostly a reactive function. This is because, in order to program cybersecurity software (.e.g. Antivirus, EDR, etc.), you have to have seen the malware operate before. In other words, someone has to have been a victim already.
An example of this reactive process is using a malware file's hash value and identifying that hash value to anti-virus software as known malware. Before this identification, the malware is unknown, and anti-virus software does not consider it a threat nor prevents it from being started. Heuristic detections, where security software looks for hacker techniques (e.g., PowerShell, modifying system configurations, etc.), also require knowing how the threat actor has performed attacks in the past. But if the malware is new or the anti-virus software you use has not updated its program to detect it, you are not protected.
Patch management is more critical than ever
If your patch management process cannot keep up with the ever-increasing volume of updates, your internet-facing devices are also at risk. So far, the year 2020 has been remarkable with the number of critical vulnerabilities on edge devices. Several big-name vendors, including Cisco, Big 5, and Microsoft have seen remote code execution vulnerabilities that have been given criticality ratings of 9.8 or higher on the CVSS Rating Matrix.
The sheer increase in the number of critical vulnerabilities across all products has required patch management processes to be as strong as ever. Threat actors are turning newly identified vulnerabilities into weaponized exploits quicker that patch management teams can safely install. Patch management teams are at a disadvantage due to testing requirements and the scale of the network. Testing is still critical in patch management and can't be shortcutted; we have recently seen a client attempt to patch an edge device that failed, leaving the device vulnerable and the vendor having to perform custom development support to the customer to remediate.
The nightmare scenario is where a new ransomware binary file is introduced on your systems via an unknown vulnerability on a network edge device. In this situation, you can't see the attackers breaching your perimeter systems, and your anti-virus software does not detect the ransomware binary when it is uploaded to the system. Although a nightmare scenario, this example is not an extreme outlier. This summer, we have seen clients affected in this exact manner. A previously unknown vulnerability on a perimeter system allowed threat actors to have routable access into a client’s network and successfully introduced ransomware on systems with no alerts or forewarning. AT&T Cybersecurity consulting can evaluate your patch management process and recommend solutions to help increase its capabilities. We can advise on the most efficient methods to test and implement critical patches and updates and help to get ahead of the attackers.
Phishing, because it works
The second vector that always produces results for the threat actor is phishing. Human beings suffer from a natural truth bias, and coupled with the fact that a phishing victim only has to be fooled once, shows why phishing is still a thriving attack vector today. With all the tools and years of experience dealing with phishing, we can only hope to decrease its success rate via the workforce's education and identify previously seen files and techniques. But like most processes in cybersecurity, it is a reactive process. An automated anti-phishing application scanning emails combined with a strong user education program offers the strongest counters to help prevent phishing attacks. However, when an employee is successfully phished, having robust email server log data enables a security team to mitigate attacks quickly before extensive damage can be done.
Threat actors want their tools to be viable for as long as possible. It is in their best interest to remove evidence of their behavior and delete the binary tools used to prepare the network environment for encryption. This behavior of covering their tracks is commonly referred to as Tradecraft. When performing forensic analysis of an incident, AT&T investigators query several different data sources in the clients network. This analysis includes inspection of firewall logs, operating system logs, and disk-level forensic examination of systems looking for mistakes in the threat actor's Tradecraft. When threat actors erase this evidence, forensic analysts are often forced to speculate on root-cause conclusions.
AT&T has also seen client environments where the log retention on critical servers are configured with default settings, which are many times insufficient. For example on Microsoft Windows servers, the default security log setting is for a maximum of 20 megs, and on a busy domain controller, that may only provide a few days of security logs. In almost all investigations, we find logs have been erased by the threat actors, or not containing relevant data because of a time or data limit.
The strongest controls you can put around protecting log files is to aggregate them into one source. There are several options available, including AT&T's USM Anywhere platform. USM Anywhere enables log aggregation combined with advanced analytics and fused with our latest threat intelligence into an online dashboard allowing security staff to perform near real-time analysis and be advised of the newest discovered threats and apply them to your overall cybersecurity efforts.
With threat actors gaining access via a successful phishing attack, we find inadequate logs during our investigations. This includes email servers hosted on-premises or via a managed online mail service. When managing common on-premises email servers, the default log retention policy is thirty (30) days. Implementing higher log retention values from online email services could potentially lead to an increase in the per-user monthly cost. On-premises applications are only restricted by storage space.
When comprising a cybersecurity strategy implementing controls and tools to help prevent breaches is the highest goal. To provide for a robust program, strategists should consider a nightmare scenario and incorporate them into their overall cybersecurity programs. After a breach, having the ability to query your logs from one source helps reduce the analysis and decision cycle, thereby decreasing downtime. A robust logging aggregation solution not only provides valuable insight into the activities taken by a threat actor on a system but also helps to protect against attackers attempting to erase log data. Additionally, preserving logs allows for an evidence-based conclusion if a forensic investigation is necessary. Lastly, log aggregation also provides benefits for troubleshooting and traditional IT administration.