By Troy Kent
Threat Researcher
Abstract

It’s no secret that the SOC is understaffed, overburdened and inundated with new threats every day. Security tools try and shortcut some of their detection logic by whitelisting – it’s an attempt to lower the processing burden but also generate fewer alerts. In fact, many security teams also use shortcuts like these to automate certain processes, which can save precious time and cut down the noise. One common shortcut used by security analysts and products alike is creating a domain whitelist by pulling from one that already exists – such as the Alexa Top 1 Million. There is a broad assumption that if a website is heavily trafficked, it’s safe. As this research shows, this common practice could be opening you up by allowing access to malicious domains. We examined the Alexa Top 1M list of domains and found that potentially malicious domains were making it into the top 0.02% of the list, well within the top 200! In the end, frequency lists like these are not intended for determining the legitimacy or suspiciousness of a domain, so doing so can seriously handicap your security efforts.

Introduction

One of the most time-consuming and annoying things to deal with in the SOC is noise. No, I don’t mean the noise coming from your snoring neighbor on third shift; I mean noise created by your tools. Noise can mean a few different things in the context of a Security Operations Center. Your intrusion detection system may be generating a lot of noise if it’s not tuned effectively or if the signatures have a greedy regular expression in them somewhere.

Your web application firewall may generate a lot of alerts that you would consider noisy because everyone is scanning everyone, all of the time (seriously, spin up a honeypot on a VPS somewhere if you don’t believe me). You may have to filter out a lot of noise when you’re hunting. For example, if you’re a masochist like me and feel like you want to stare at 24 hours worth of DNS requests to see if anything pops out as suspicious (don’t knock it till you try it), a lot of what you’re looking at is going to be noise. There may be a lot of requests that look suspicious at first glance due to their length, but end up being nothing more than hash reputation checks…

hash reputation checks

…or someone being silly…

Pi to one million decimals

There may be suspicious looking requests for domains that appear to be using a domain generation algorithm, but it turns out the domain name actually means something in a language you don’t recognize. There will also likely be many requests for domains that are essentially ubiquitous (google.com, facebook.com, youtube.com, etc.) and generally benign that don’t look suspicious at all, but you still have to sift through them.

Hey Alexa, can you whitelist for me?

Wouldn’t it be nice if you didn’t have to deal with noise, or at least as much noise? Spoiler: the answer is yes. My colleagues and I sat down to talk about effective methods to whitelist noisy domains that end up being little more than a time-consuming distraction during investigations. One thing that was immediately apparent was that the solution would have to be as automatic as possible, otherwise you’re likely to spend as much time curating and tuning the whitelist than you would save during investigations.

One simple way to automate a whitelist is to pull from one that has already been compiled for you. The first whitelist that made sense to us was the Alexa Top 1m list of domains. It doesn’t appear that Alexa is really intended for this purpose, however it seems logical that the most visited sites would be benign and could be considered noise during an investigation. We’re not the first group of security professionals to consider using the Alexa top 1m for whitelisting, I assure you.

The first question we had was: “How many domains do we want to use?” Surely there are bound to be some domains in the Alexa Top 1m that are suspicious / malicious / compromised; 1 million is a lot – you can quote me on that. How many of the top domains can we grab from the Alexa Top 1m and still feel certain that we’re not going to whitelist something that we’re actually interested in looking at? 10,000? 5,000? 1,000?… I can’t remember what number we decided on initially, I think it was either 1,000 or 5,000; that’s only 0.1% and 0.5% of the total number of domains in the entire list of 1 million. To make sure that we’re not going to inadvertently whitelist a nasty domain, we start scrolling through the domains (a very scientific process…). As we’re scrolling through the list, we were like:

alexa top 500 malicious site

Number 447 on the Alexa Top 1m, ranked just one less than glassdoor.com, and 5 less than dell.com, is a domain just sitting there staring back at us, all smug. What’s even more interesting than being number 447 is that it didn’t stop there. I kept monitoring the list for over a week and it continued to climb up to 432 like it owns the place. Since then, it has gradually fallen in rank. As of August 22, 2017, it had dropped down to 586. The domain looks suspicious because it seems like randomly generated nonsense, much like the DGA domains that some malware like to use. Before we start jumping to conclusions, I suppose it would be prudent to gather some OSINT on the domain to make sure it’s not just some benign site with a weird looking name (maybe it means something in some language I don’t understand… apparently “jogi” means “legal” in hungarian).

What is piz7ohhujogi[.]com?

The domain was created 2016-10-11, so it’s not that new and not that old. It was registered with WHOIS Guard, so it doesn’t look like we’re going to get any more useful information from the whois.

Next stop, Google:

PUP domain google results

Not that I have any idea what the integrity of some of those sites is but there’s pages of results like that, which leads me to believe that this domain is at the very least not entirely on the up and up. Of course, those results don’t include any technical explanation of what part the domain has to play in the infection or even what kind of infection it is. I believe some of them implied that it was traffic being generated by some kind of toolbar and/or PUP.

I also visited twitter to see if there was any chatter concerning the domain in question. There was a somewhat entertaining exchange between two twitter users @geniuscpps and @CPReborn about ads redirecting to or from the domain. Based on that exchange it still seems like the domain is either part of an ad redirect chain or a pup toolbar that serves ads. Aside from that conversation, twitter only had the same unspecific information about the domain available in the google results. Some of the tweets did happen to have complete links though, for example:

hxxp://piz7ohhujogi[.]com/click?h=Ax722bagzrku2AvQbWraCRtffsSTOrzGGiE-1xM-Gcspc8y1IQ4j6fxvxXvS7vOirUTfttFeCdqN0zVPfI

I started entering those links into link checkers (like https://urlquery.net/) to see if I could get any interesting content back. A GET request for the root web page responds with a 404.

urlquery results

However, a GET for request for /click returns the following (using Rex Swain’s HTTP Viewer):

http traffic

It just closes the window…

Most of the links I gathered from twitter led to the same response as above, however there was one that behaved differently. Aha!

HTTP/1.1·302·Found

Date:·Wed,·02·Aug·2017·16:44:00·GMT

Location:·hxxp://npmpecd[.]com/rcpvkey=S2UZYHUX1KRO&type=direct&url=http%3A%2F%2Frtbpopd.com&pt=NEWTABUNDER&subid=648527

Content-Length:·0

Connection:·Close

For some reason, the http viewer didn’t follow the redirect in this case. But if you enter the url in the Location header in, you’ll see some more redirects. (Don’t be alarmed by the hex encoded string at the end of the JavaScript at the bottom of the page, it decodes to a redirect to “hxxp://target1.track-p958o4[.]link/?utm_term=6449720244884736312&clickverify=1” #notAllHexIsShellcode).

Before I call it an ad redirect chain / click fraud, I wanted to find some more examples of redirect chains including the domain. It’s also important to find what domains are redirecting to the domain, rather than just from it. Luckily urlquery.net had a number of redirect chains that included our domain.

http://urlquery.net/report/42f577b5-231f-4aa2-8aee-cb7cf0675806
http://urlquery.net/report/3e930b2b-00bb-400c-9bc1-6c916e6f67f9
http://urlquery.net/report/def224f0-10bf-4744-88b2-980be09e2d62
http://urlquery.net/report/a620a08b-2ad0-4303-9999-0193d15ec8f6
http://urlquery.net/report/e3844903-a749-43e9-a893-151ed19e5be3
http://urlquery.net/report/4cd63cd1-6a62-405a-ab9b-9991588a8e9c
http://urlquery.net/report/076f2ba3-93f5-4e8e-b1ea-5040e6d0e161
http://urlquery.net/report/6bb4b5b0-21ee-4cea-b181-8bb0555347ba
http://urlquery.net/report/ace699ed-95ce-4f32-bf8c-79fcc551bbec
http://urlquery.net/report/5ab4bca0-0c50-4c1c-9619-ede9e973099f
http://urlquery.net/report/760e0f30-2a74-4f83-92ee-e3b45e2424fa

The starting URLs:

hxxp://fileking[.]space/
hxxp://es.tarjetarojaonline[.]tv/2013/12/univision-deportes-network-en-vivo-por-internet.html
hxxp://wizhdsports[.]is/p2p/stream17.html
hxxp://www.beinsport-streaming[.]com/streaming/22421/streaming-beinsport2.html
hxxp://craigslistt.my[.]vg/
hxxp://sporthd[.]me/wplayer.php?v=596e2320bbd641500390176
hxxp://webshit.ye[.]vc/niqqa.html
hxxp://fileuj[.]com/
hxxp://livestreamhd[.]me/live/france21.php
hxxp://jessicarahhal[.]com/
hxxp://video.finegourmets[.]com/

Most of the domains above appear to be related to serving ads and they don’t have any negative OSINT. Three of them (wizhdsports[.]is, sporthd[.]me and livestreamhd[.]me) are also on the Alexa Top 1m, however they are much lower on the list (187327, 202856 and 352239 respectively). Based on the redirects and screenshots from the urlquery reports, this still seems to be ad redirects / click fraud. It’s always possible that the domain could also be involved in more malicious traffic, but even if it’s only involved in ad redirects / click fraud, why is it on the Alexa Top 500!?

Who Wants to be a Millionaire?

I was previously under the assumption that Alexa’s Top 1m would be a list of domains that are too popular to be malicious, or even suspicious. Maybe the domain that we saw isn’t malicious, but best case scenario, it’s part of an ad redirection chain. How is a site like this in the same range of domains as glassdoor.com and dell.com? How is this domain that doesn’t appear to serve any original content beating sites that actively attempt to increase their Alexa rank?

According to Alexa’s website,

Alexa domains description

It’s also worth noting that the sample population that Alexa uses, only includes people who are willing to download and install toolbars like https://www.alexa.com/toolbar

alexa toolbar download

Based on their explanation it at least seems that we shouldn’t see any domains that are requested solely by standalone malware (most malware that I encounter sends its own network traffic rather than hijacking a browser). However, the domains that you might see on the Alexa Top 1m could potentially include domains redirected to by ads or pup toolbars.

It would be interesting to explore the possibility of getting a newly created domain into the Alexa Top 1m by gaming the system in some way. However, assuming you’ve even stuck around this long, that would be too much to fit in just this one blog post.

Some of These Things Are Not Like the Others

If it’s possible that a domain like piz7ohhujogi[.]com made it into the Alexa Top 1m what other nasties have snuck their way in? In order to get a general idea I compared Alexa Top 1m with six different malware blacklists.

 

Blacklist Number of Domains in Alexa Top 1m Highest Rank Achieved
Maltrail 36 24144
ZeusTracker 0 N/A
MalwareDomains.com 110 197
Malware Domain List 4 80666
Malware Bytes 1308 8
Cybercrime 1 436183

The Malware Bytes list had the most domains that were also on the Alexa Top 1m (1308), however the types of domains appeared to be very different from the other blacklists. For example, here are all the domains in the Malware Bytes list that are ranked higher than 1000 on the Alexa Top 1m:

[‘8’, ‘qq.com’]
[’17’, ‘sohu.com’]
[’93’, ‘thepiratebay.org’]
[‘107’, ‘uptodown.com’]
[‘139’, ‘clicksgear.com’]
[‘408’, ‘iqiyi.com’]
[‘648’, ‘utorrent.com’]
[‘766’, ‘internetspeedtracker.com’]
[‘888’, ‘cam4.com’]
[‘974’, ‘turbobit.net’]

Before you go out and grab your pitchforks and torches, take some time to realize that the domains are not inherently malicious. The first domain, qq.com, is a popular Chinese social website that offers a messaging app. The second domain appears to be a Chinese news site. However, depending on your acceptable use policy, you may want to block some of the other domains to prevent pirating software (thepiratebay[.]org,utorrent[.]com) or viewing of pornography (cam4[.]com).

MalwareDomains.com did have another suspicious domain in the top 500, this time at rank 197 (pipeschannels[.]com). The root web pageredirects to google.com oddly enough. Speaking of Google, the Google results for the domain look familiar:

osint on pipeschannels.com

This domain also used a Whoisguard like service. Surprisingly, the domain was created on 2017-06-10, which is less than 3 months ago. If you remember, Alexa claims that the Alexa Top 1m are calculated using data from the “last 3 months.” Not only is this domain in the Alexa Top 1m, but it achieved rank 197 in less than 2 months, while every other domain in the list potentially had a whole month head start.

So What?

It’s worth reiterating that Alexa doesn’t advertise the Top 1m as a whitelist for information security purposes. But does your security vendor do so? It is also worth pointing out that I didn’t prove that any of the domains in the top 500 were inherently malicious, just that they’re not the type of domains you would probably expect to be that high on the list. If those domains can make it that high, what’s stopping a malicious domain from doing the same? Or one of these domains being appropriated for even more nefarious purposes than just being a PUP.

Knowing that a variety of tools and SOCs use the Alexa Top 1m as a security whitelist would make it lucrative for a malware author to attempt to have their domain move as high as possible in Alexa rank so their C2 traffic could potentially be whitelisted and successful regardless of what security technology you are using.

So what? The point of this post is to highlight the potential dangers of the implicit trust that us security folks can put in the external sources we rely on to make our tools more efficient. Alexa does not curate their list to be used as a security whitelist; they do not guarantee that the domains on the list are not malicious domains; however, many of us trust it as a whitelist anyway. Aside from Alexa Top 1m, what other external sources are you relying on in your security stack that you might be trusting just a little too much?

Security Analysis
Security Operations Center
Security Research