By Gary Golomb
Co-founder and Chief Research Officer

It has now been a couple of months since GDPR has gone into effect – a regulation that many predicted would be a bad thing for security analysis. This message was echoed by many, many, many sources, with most of the subtext largely following the line that GDPR-related censure of WHOIS data “is thwarting security researchers and police.”

But, is it really?

As with most infosec-related pontification in the media, the truth is much more complicated – and more positive – than the headlines assert.

Whatis Whois?

The hullabaloo revolves around a public registry referred to as WHOIS. It contains all the contact info a person (or a script) uses when they register a domain. For example, consider the following domain seen with odd traffic in a customer’s network.

attacker domain WHOIS pre-GDPR
Figure 1. Attribution 101: If the registrant was in their 40’s, they may have faked the zip code 90210

Here, we see the domain 3s4s.gdn was registered to someone named John. John without a last name. We see John registered from California, CA, in the United States. Of course, since his name is John, you probably would have guessed that already 😊. We also see John’s email address is [email protected]. Post GDPR, everything under “Registrant Contact” in the screenshot above is now censored. Not just the registrant contact info pictured above, but two other sections just like it.

And therein lies the source of much heartburn. Utilizing John’s email address, it is quite easy to do some reverse-searching and see that John has registered quite a few domains. With a few of those other domains, “he” used the last name “Modi.” Needless to say, that’s probably just as likely to be his real last name as John is his first. Most of his domains use the top level domain .gdn, a TLD (Top Level Domain, like .com or .net) as well as the .top or .asia TLDs—all of which are frequently used by nefarious actors because they tend to be very cheap domains to register (meaning an attacker can register hundreds or thousands of them very cost effectively).

At this point, if you wanted to block this traffic at the perimeter, you could block the domain 3s4s.gdn, plus all the other domains discovered that were also registered using [email protected], since they’re likely just as malicious. This has been a standard practice for many security vendors because it’s simple and provides great returns for the effort (investigate one malicious domain, enable protection across a multiple of domains) and it’s pretty accurate, most of the time anyways. Anti-spam technology is a great example of solutions that have used this capability significantly to much success.

You can also see how the information above can be quite useful for investigating linkage between domains and “attribution,” i.e. to identify the person using a domain for criminal activity. I use the term attribution extremely loosely here. Some people like Brian Krebs, who has highlighted many issues with GDPR, have done phenomenal work using WHOIS to unmask criminals. However, these cases are dramatic because they are the exception to the rule. Most frequently, WHOIS is useless for attribution, but nobody gets attention for blog posts talking about how frequently the information isn’t useful. For every amazing unmasking that investigators like Mr. Krebs have hit a grand slam with using WHOIS information, we have probably hundreds of other searches that have been dead-ends.

But let’s ignore “correctness” of WHOIS information for a bit and instead focus on the ability for enterprise security teams to effectively detect and respond to new threats in their networks in this post-GDPR era.

attacker domain WHOIS post-GDPR privacy protection
Figure 2. Post GDPR WHOIS Record

As we see in Figure 2, quickandclick[.]bid has its registrant contact info masked. In other words, this is what investigators now see. At first blush it might be easy to throw in the towel and claim investigative defeat. But there is more to this story.

Contact info is not the only information in WHOIS records

WHOIS records also contain the registrar in addition to the current nameserver for the domain, as shown next:

attacker domain WHOIS post-GDPR
Figure 3. Winner winner, chicken dinner. Nameservers are also domain names we can investigate.

So even with GDPR censoring, we still see the registration date, expiration date (both dates in Figure 2), registrar, registrar ID, and the nameservers serving the domain. Given the right tools, this uncensored information is exceedingly useful in performing many of the workflows that folks are saying GDPR has devastated.

Registration dates are an overly-simple example of this. First, we see the domain was registered less than a couple of months ago. Seeing young domains in a corporate environment is remarkably uncommon, as most activity in corporate environments is based on interactions with higher reputation services or sites that typically have had a presence on the internet for longer than a few days. The domain was also only registered for a year, which is also odd. Most legitimate domains are registered for much longer lengths of time so the owner doesn’t risk losing the domain to accidental expiration.

Of course, dates alone are not enough to begin discerning the suspiciousness of a destination. Luckily, we still see the registrar, and that tells us some more. For example, while it’s not correct to say any traffic to a domain registered using Namecheap is malicious, it’s certainly true to say that Namecheap is one of the more abused registrars used by attackers. So, seeing a domain registered recently, for a short amount of time, and from a “notable” registrar begins to paint a suspicious picture about the destination of this traffic. However, there is more.

We can actually determine a lot from the nameserver too. For example, if the nameserver was ns2.registrar-servers.com, we’d know they were most likely utilizing Namecheap to host their site because that is one of the nameservers Namecheap uses for sites they host. If the nameserver was ns2.registrar-server.com (it’s hard to notice, but this second example contains a singular “server” while the first legitimate example from Namecheap uses the plural “servers”), we’d have to wonder if this was an attempt to spoof a common nameserver to hide in the noise commonly on networks, or worse – perform a type of typo-squatting attack.

From a slightly different angle, if you see a domain registered using GoDaddy and the nameserver for that domain is also given as nsXX.domaincontrol.com (where XX is a two-digit number), then you know they’re hosting the site using GoDaddy. If the nameserver was something different, they are likely hosting the site elsewhere. Depending on where that “elsewhere” is, things can get quite interesting in your investigation.

Nameservers, registrars, and dates – oh my!

So, what happens when you put all this together? Well, you can find very sophisticated attacks targeting only your enterprise, as well as catching general scammers/attackers with related techniques.

For an example identifying general scammers/attackers, let’s pick a very common domain/company like Amazon. AWS and domains managed by Amazon are registered using the same registrar (MarkMonitor) and those domains are managed using the same set of nameservers, as you might expect.

In Awake, we automatically find deviations from that norm by automating the steps a human expert would take if they were in this situation. Let’s walk through that step by step.

First, show me any traffic (using any port or protocol) going to an IP address that has been associated with a domain containing “Amazon” or “AWS” (clearly you could add more here). It is worth noting that queries like this one use “passive DNS” records by default which are far more accurate than current standards for search in related technologies:

domain.name like r/amazon|aws/

We then add the next part: asking if the domain associated with that IP address has been registered by any registrar other than MarkMonitor (the exclamation point at the beginning of the statement means “not”):

!(domain.registrar.name like r/MarkMonitor/)

Or where the nameservers pointing to the domain associated with that IP are not Amazon’s nameservers.

|| !(recipes.destinations.amazon.nameservers))

For the sake of clarity, it’s worth mentioning that the last statement is something we call a “Query Building Block” in Awake. You can think of it as a “macro” representing more complicated logic, but in a format that is much easier to understand and reuse. The query underneath is along the lines of this:

(any domain.nameservers like r/\.awsdns-[0-9]{1,2}\.|\.AMAZONAWS\.COM$|^AWSNS[0-9]{1}\.|\.dynect\.net$|\.ultradns\./)

It’s nice to have the shorthand, while still retaining the ability to get under the hood and modify things, if you’d like.

Putting this type of logic together in Awake regularly identifies threatening activity for our customers, like the scam-related traffic shown below.

domain typosquatting attack
Figure 4: The type of traffic you don’t want to see from your enterprise assets, found using only the information that exists in WHOIS after GDPR has gone into effect.

The above example is simple and specific. To make it more generic, we built functions at Awake that automate this technique– typoSquat and domainFuzz, are shorthand for more complicated logic that takes an algorithmic approach similar to what we described above to identifying traffic to suspicious permutations of a given domain. In the example below I will use Awake, but you can easily imagine doing the same for your own company.

domain.name in domainFuzz “awakesecurity.com”

If this alert catches anything, it should be the type of activity that is highly targeted to our company. And what do we see?

typosquatting attack
Figure 5: Sometimes the eye is faster than the machine. Sometimes, like this time, it’s the other way around.

It does indeed catch a successful red team phishing test. In this activity, the domain awakesceurity.com was used. Notice the typo – “security” is misspelled.

Conclusion

Make no mistake, from a security research and investigation perspective, GDPR changes to WHOIS do indeed suck. And yes they will make some of the more mundane security adversaries harder to find and block. But even the slightly more sophisticated attackers were smart enough to realize investigators would look at WHOIS records.

At the same time, as this commentary showed, it is still pretty easy to find threats targeting organizations using only the post-GDPR information in WHOIS. The catch though is that the vast majority of enterprises (including many of the most well-funded security teams) do not have the ability to perform the types of detection and hunting I just showed.

So, one interpretation is that GDPR is not the real problem here. Rather, this should be a wakeup call to the industry to evolve protection, detection, and investigation techniques to match the new realities of data access, availability, and types available to security practitioners. Far too many solutions on the market today are built on top of old commodity components, which were not built for network and policy realities of today.