Blog Post

Kaseya Supply Chain Breach – Threat Detection and Investigation with the Awake Security Platform

Overview

The world was hit with a one-two punch of cyber threats over the last couple of weeks. Within days of each other, several 0-day proof of concept exploits were released affecting all domain controllers (aka: PrintNightmare), then the Kaseya supply chain compromise affecting over 1,000 companies was announced. If you read our post about the PrintNightmare vulnerabilities, you saw how Awake provided out-of-the-box detection for the 0-day exploits, as well as the custom UI visualizations we released for researching target DCERPC services in Windows domains. We also presented a new form of threat detection that is likely to identify future 0-days in uncommon services.

In this blog, we discuss the Kaseya breach, but we’ll also explore in much greater detail the new form of threat detection we’ve been describing internally as “configurable network anomaly threat detection models.” If that sounds like a word-soup of BS marketing terms, don’t worry – you should grok the significance of it by the end of this post.

Before progressing, it’s important we examine two high-level concepts that are not well-understood outside professional [career] threat detection development communities. Those are type of artifact (consequential versus direct) and the specificity of artifact. First, let’s talk about artifact types.

Consequential artifacts were described in the PrintNightmare blog as:

Consequential artifacts are [many times] the behaviors a system exhibits in response to being exploited or compromised. In most cases, attackers have no control over these artifacts being produced. They can certainly attempt to clear these artifacts from logs or hide them from EDR agents, but the attacker has no control over the initial generation of these artifacts.

In this blog we’ll also focus more on direct artifacts, which can be defined as:

Direct artifacts are the characteristics of threat activities that are artifacts of the tool or exploit itself. Many times, attackers have direct control over these artifacts as well, so these can be less reliable with very sophisticated attackers who know how to obfuscate them correctly. However, direct artifacts are a stronger signal of threat activities than consequential artifacts.

To summarize then:

Kaseya Supply Chain Breach 1

Table 1: Summary differences between consequential and direct artifacts.

The second high-level concept you should understand more formally is the specificity of artifacts.

On one end of the spectrum you have the MITRE ATT&CK Framework, which strives to classify threat activities in an excruciatingly detailed matrix. This hyper-detailed view of threats serves a number of purposes, as evidenced by massive enterprise adoption of this framework. For example, if you have a library of threat detection signatures (generally defined as detection methods that are specific to a certain vulnerability, exploit, malware/tool, etc.), then you need a framework like ATT&CKto help identify coverage and gaps of that technology.

On the opposite end of the spectrum you have the “lowest common denominator” (LCD) characteristics of threats. The LCD view of threats strives to define the characteristics that are shared by ALL threat activity across very broad categories. For instance, what are the fewest number of characteristics you can define that, in aggregate, should identify ALL C2 activity? If you really stop and contemplate that question (as we did for years when we started Awake), you will begin to discover LCD artifacts.

Lowest common denominator artifacts can be either direct or consequential, although specific artifacts are generally direct artifacts.

Clear as mud?

Let’s put these concepts together to examine the first stage of common REvil operations, the group behind the Kaseya supply chain compromise. (Note: This is not how Kaseya was compromised, a subject we’ll definitely return to further below. Described here is simply the MO used by REvil that is able to successfully breach enterprises most frequently.) The first-stage attacks most commonly executed by the REvil ransomware group work by:

  1. Sending emails to their targets with malicious Word or Excel attachments
  2. When the attachments are opened, malware is downloaded and executed.
  3. The malware then harvests sensitive data from the endpoint (commonly emails) and exfiltrates them to an attacker-controlled server, or
  4. They attempt to elevate privileges and move laterally using tools like Cobalt Strike and Mimikatz.

To keep this blog-length and not book-length, let’s focus on the single point where all the security controls and prevention technologies have failed every time the sequence above is successful: at the point when the malware is actually executed on an endpoint.

Step 1: Malicious Office Document is Opened

Kaseya Supply Chain Breach 2

Why am I even talking about opening office documents on an endpoint if this blog is focused on network activity?

Because, this is also a blog about consequential artifacts, and with a focus on consequential artifacts you can detect not only when Word, Excel, PowerPoint, etc. has been opened on an endpoint (a weak signal), but you can also tell when Office has been opened followed by other suspicious indicators as well.

Awake threat model -Kaseya Supply Chain Breach 3

Figure 1: An Awake threat model for identifying suspicious activity in conjunction with Microsoft Office starting.

The important point here is that if an attacker is going to use malicious Office documents, they have no control over the appearance of these indicators on the network. They will be present no matter what the attacker does to obfuscate their activity. Even with an Office exploit, they cannot stop the initial network telemetry startup sequence exhibited by Office – which occurs before attacker content inside a document is evaluated). We’ll always know when Office started so we can detect other suspicious conditions after that activity.

Step 2: New Executable (or Code) Runs on Endpoint

network activity- Kaseya Supply Chain Supply 4

Why am I even talking about running an EXE on an endpoint if this blog is focused on network activity?

You might be starting to see a pattern here, but consequential artifacts are also emitted many times an EXE (and other forms of code) are executed on an endpoint, especially for the first time.

Awake’s EntityIQ profile

Figure 2: Awake’s EntityIQ profile for a high-risk device that has been identified executing an executable for the first time followed by C2-like behavior.

Similar to the Office example, when an executable is run for the first time, Windows (or MacOS) will connect back to the mother ship, even when a tool like Windows Defender is disabled. It does this to check for compatibility information and depending on version and configuration, will also report basic telemetry. If Windows Defender is enabled, there’s even more consequential artifacts available on the network. Again, all of this happens before the attacker’s code is executed, so again this is a process that betrays the attacker’s presence, while they have zero ability to subvert these processes.

Hopefully this is proving to you the remarkable power of consequential artifact-based detection. Unfortunately, it’s extremely computationally intensive to perform this analysis accurately enough for enterprise-grade threat detection platforms. The platform needs to be designed from the beginning to handle this pipeline of analysis across many threat cases, which is one reason why we’re not familiar with any other platforms that handle this approach to threat detection well.

Step 3: Malware Connects Back for 2nd Stage or C2

Malware Connects Back for 2nd Stage or C2

And here is where we reach our first direct artifact (red = direct artifacts, green = consequential artifacts). An over-simplified example at this point might be seeing an HTTP request for gate.php, a default URI for C2 controllers that’s so common, it’s almost silly. In this case gate.php would be a direct artifact of the C2 activity, except the vast majority of malware in the real-world uses TLS so we won’t see any URIs in reality. (If you believe TLS has had a negative impact on network traffic analytics, please see this to reorient your point of view. And this.) Because of this, the focus on this entire blog is on encrypted C2/Exfil. As was the case in the Kaseya breach. See? I told you we’ll be getting back to that subject…

REvil malware and PCAPS- Kaseya 1

Figure 3: The REvil malware and PCAPS from the Kaseya compromise that have been shared with the threat intelligence community trigger a number of model matches in Awake.

Figure 3 is critical for two primary reasons:

  1. This is the REvil Kaseya supply chain compromise traffic.
  2. The malware used in the Kaseya case triggers some of the most broad “catch all” lowest common denominator detections for malware in Awake.

It’s reported that REvil uses QakBot most of the time and Cobalt Strike some of the time for initial infection vectors. In the Kaseya breach, most AV vendors are calling the malware Sodinokibi. Really, I don’t care what family of malware they use because I don’t focus on individual families of malware. Keeping up with malware families is exhausting at best, and impossible at worst. (My vote is for impossible.) LCD-based detection strategies allow us to focus our resources (both computational resources as well as human researcher resources) most effectively by keeping us focused on the artifacts that many (and in the best of cases, almost all) malware families share with each other – and the artifacts that separate malware from legitimate applications. You’ll see concrete examples of this later in this blog.

Many times, malware triggers the Uncommon Statically Compiled Binary Communicating with Suspect Destination alert in Awake. This model strongly excels at identifying all statically complied malware, like those written in Go and similar cross-platform compiled binaries. (Go malware is now common, having been adopted by both APTs and e-crime groups | ZDNet) The model does this by identifying outlier TLS fingerprints, then closely examining the behaviors of sessions with those outlier TLS characteristics, specifically in the fields used for TLS Fingerprinting. (If you think TLS fingerprinting is synonymous with JA(3), watch this SANS presentation.) By working this way, the model also catches a lot of malware that has not been statically compiled as well, simply because the malware utilization of standard network libraries is frequently different from common application usage, so non-statically compiled malware will still frequently trigger this alert as well (as you can see in the Kaseya malware traffic in figure 3).

There’s another alert pictured in figure 3 that is a powerful example of LCD-based detection, but maybe easier to conceptualize than the Statically Compiled Binary alert just described. That is “Non-Browser TLS Communicating to Uncommon and Suspicious Destination.”

Is most traffic on the network TLS these days?

Yes!

Is the vast, vast, vast majority of that TLS just benign web browsing by a user using a web browser?

Yes!

Does malware’s (and most other software’s) TLS traffic have different characteristics than TLS from browsers?

Yes!

An LCD of malware is that its TLS traffic tends to stand out in some way (a weak signal) combined with server (destination) characteristics that are also uncommon. However, attackers don’t always use “uncommon domains.” Sometimes they use extremely common domains, like AWS, Azure, GCP, Mega, etc. That’s ok because there are LCDs related to that activity too, as shown in figures 4 and 5 below. (Later in this blog we’ll examine the code inside these models more closely to see how this actually works.)

AWS, Azure, GCP, Mega- Kaseya Supply Chain

Detecting C2 to common cloud service providers like CloudFront- Kaseya Supply Chain Breach 2

Figures 4 and 5: Detecting C2 to common cloud service providers like CloudFront is impossible for most Network Detection and Response solutions, but the “configurable network anomaly threat detection models” Awake provides customers by default are very effective at catching most malware utilizing cloud storage providers.

As you can see from the detections Awake produces from the Kaseya REvil traffic in Figure 3, Awake flags “uncommon” (typically, attacker controlled in some form) C2/Exfil domains. In figures 4 and 5 (cloud domains) we see very common cloud storage providers being used for C2/Exfil. But, what about when no domain is used at all? For instance, if the C2eExfil uses hardcoded IP addresses. As you see in figure 6, those conditions are completely detectable using these models also.

Recurrence and [un]commonality are two hallmark LCD- Kaseya Supply Chain Breach 2

Figure 6: Recurrence and [un]commonality are two hallmark LCD artifacts of threat activity, especially when combined with other weak signals.

So, let’s check-in here with a pop quiz…

What do figures 3, 4, and 6 have in common?

A. They’re all detections from Kaseya malware and REvil PCAPs.

B. They utilize ML to identify artifacts that are common or uncommon for a given type of device.

C. They utilize weak signals that are the lowest common denominators of threat activity to identify breaches regardless of the type of malware or C2 server used.

D. All of the above.

E. A and C.

If you answered (E), that’s because I haven’t showed you more details behind these models. Don’t worry, that’s next. The answer is actually (D), and as you’ll see next the same techniques work on lateral movement detection in addition to C2 / exfil.

Step 4: Exfil and/or Lateral Movement

network anomaly threat detection

For this section, let’s take a deeper dive into the models themselves. You’ll notice the two models we’ll examine below use the same custom detection function; except they operate on two different types of artifacts. (Remember the term “configurable network anomaly threat detection?” We’re now getting to the “configurable network anomaly” part of the term.)

In the diagram above, the green box at the bottom represents using consequential artifacts to detect post-compromise exfil. As we discussed above, this can be done using attacker-controlled domains, using no domains at all (direct-to-IP), or using very common cloud service providers. Since attacker-controlled domains and direct-to-IP are the easiest to detect and cloud storage provider C2/exfil is the breach activity most frequently undetected by pretty much all network detection and response (NDR) platforms (as well as IDS, IPS, etc.), we’ll focus on the most difficult to detect cloud C2 here.

Adversarial Modeling Language- Kaseya Supply Chain

Figure 7: Adversarial Modeling Language (AML) model for C2/exfil activity to cloud storage providers (CloudFront, Mega, etc..).

Let’s breakdown the important parts of the model shown in figure 7. It’s also important to note that all of Awake’s models are open like this one. Customers can fully examine the “guts” of every threat detection model in Awake, as well as make copies of our models to edit or create your own.

Notable logic and predicates in figure 7:

Device is connecting to a server that is uncommon for other similar devices

The custom created function shown here (named recipes.hunting.machineLearning.characteristicArtifactMatchingAQL.*) takes a parameter of any network artifact that is analyzed by the ML engines to determine which artifacts are common and which are “suspiciously uncommon.” (The concept of “suspiciously uncommon” is described a bit in the first 5-minutes of this video.) The artifacts defined here can be anything from SMB paths and interfaces, DCERPC operations, LDAP fields, HTTP headers, or as we see in this case, values from the TLS headers, in addition to many others, of course.

Using TLS traffic that’s not from a web browser

While TLS characteristics are different between Chrome, Firefox, Edge, Opera, Safari, etc., the characteristics of browser-based TLS tends to be extremely different from almost all other software. Inside this recipe is the logic that defines the differences between browsers and non-browsers, again 100% viewable and editable within the platform.

Going to any cloud storage provider and uploading any amount of data

The final two predicates shown in figure 7 provide for an extremely relaxed definition of “upload,” while also ensuring the destination server is one of several dozen content storage providers.

And this just might be the most important part of this entire blog:

Nothing in the model above is directed towards any specific type or even family of malware. The conditions described in figure 7 apply to many, many forms and families of malware – from traditional compiled families of malware, to “fileless” attack tools. This is the power of consequential artifacts, and here those “consequential artifact filters” are being applied to anomalous characteristics of TLS sessions. The ability to customize the application of filters like this to statistically derived information from the ML engines is completely unique to Awake.

Examples like those above are why Awake has shown 0-day detections for situations like the Kaseya breach, the release of the PrintNightmare PoCs, and many other cases managed by Awake’s Managed Network Detection and Response team.

And the foundation of the model just shown that detects many cases of C2/exfil is the exact same detection foundation for many lateral movement cases as well (and was also shown in the PrintNightmare detection blog).

Awake’s standard detection library

Figure 8: All these models have the same base function as detailed in the description of figure 8, but they can be customized and applied to a wide variety of use cases, Of course, all the use cases shown in this blog are part of Awake’s standard detection library.

Again, let’s identify the notable logic and predicates in figure 8:

Device is connecting to SMB resources that are not commonly connected to

This lateral movement model is using the same basic function as the C2/Exfil model just examined (recipes.hunting.machineLearning.characteristicArtifactMatchingAQL.*), except now we’re passing it a new argument: device.characteristic.smb.share.path. This time the model is identifying traffic to SMB resources that are not commonly connected to.

Transferring any executable code

Then, in the SMB traffic that was identified as connecting to “uncommon” resources, the model looks for transfers of code. This model not only catches the lateral movement activity common in the TTPs employed by REvil and other ransomware groups, but also catches lateral movement of many other tools including Mimikatz, Cobalt Strike, psexec, the PrintNightmare PoCs, and others.

“Configurable network anomaly threat detection” is real and it detects supply chain compromises as well as 0-days

It not only identifies 0-days in obscure DCERPC interface operations (aka: PrintNightmare and others) and supply chain compromises; it accurately identifies lateral movement and a host of other threat activities. Our customers see the accuracy and efficacy of Awake’s customizable threat detection models, and Awake’s Threat Research Team sees the power of these models every time a new SolarWinds or Kaseya is discovered and the malware is analyzed. For anyone who doubts the radical advancement of threat detection methodologies pioneered by Awake, please contact us. 😉

Subscribe!

If you liked what you just read, subscribe to hear about our threat research and security analysis.

Gary Golomb
Gary Golomb

Co-Founder & Chief Scientist