What is Ghost Referrer Spam?

I get a lot of questions from clients about odd traffic showing up in their Google Analytics accounts. Lately, most of the traffic has come from what many call "ghost referrer spam". I wanted to take a moment to explain:

  1. Where does ghost referrer spam show up in GA?
  2. How do the spammers do it?
  3. Why do they do it?
  4. How can you prevent it?

My hope is that I can help web site managers understand this, so I'll do my best to keep the technical jargon to a minimum.

Where Does Ghost Referrer Spam Show Up in Your Analytics?

There are a couple places where you can see these jerks' handiwork. (The screen captures below are from two different sites, chosen based on how well they illustrate the point.)

Referral report

As you can probably guess from the name, the most prominent place you'll see these this spam is in the referral reports. All the referrers below are from spammers. You'll notice how high the bounce rate is. (The bounce rate is the number of visitors who leave your site after viewing just one page.)

 

Real-time traffic

It's not easy, but you can sometimes catch the spammers in real time. The site below typically has between 20 and 50 real visitors at once. As you can see here, 29 of the apparent visitors appear to have been referred by traffic2cash.xyz. They are known referrer spammers, and not only are these not real referrals, it's not even real traffic. (I'll explain that below.) There are probably only 26 real visitors at the time we captured this.

(I blurred the true referrer to keep my clients' identities secure.)

What is the Net Effect on My Stats?

Although these are the most prominent reports where we can see their fingerprints, the fake traffic is skewing the data in many of the other GA reports, as well. The first site shown above is a medium sized business, with approximately 500-1000 legitimate sessions per month. The attempted spam traffic doubles their apparent traffic. That's huge. The second site (the real-time report) has 50,000-60,000 sessions per month. The spam traffic is relatively minor, and so their reports are much more accurate without any attempts at filtering the spam.

How Does Google Analytics Work?

In order to understand potenetial solutions, it's really important to understand how GA gathers and reports data about your site. Notice that I didn't say "from your site". That's because your web server doesn't interact directly with GA. Your server never sends its traffic logs to Google.

Let's look at how your web site traffic works and how Google gets its data. Here is the basic flow of data when a person visits your web site:

  1. The user visits your site via one of several methods:
    • by entering your URL directly in the address bar,
    • by using a bookmark,
    • by clicking on a link from another site, or
    • by clicking on a link in search engine results.
  2. The browser sends a request to the web server hosting your site.
  3. The web server responds by returning your site's HTML to the browser. That HTML contains a reference to the GA Javascript code. That code contains a GA tracking ID, something like UA-12345678-1. This code is unique to your Analytics account and website.
  4. The browser reads the Javascript and sends a request to GA's server, sending information about the traffic.
  5. Google receives the request from the web browser and records the data.

Above, I've highlighted the actors in each step. The main thing to take away from this explanation is that the user's web browser initiates the contact with Google Analtyics. At no point does your web server talk to Google.

Here is the Javascript that the browser reads and uses to interact with Google:

<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-12345678-1', 'auto');
    ga('send', 'pageview');

</script>

This is GA's Javascript in its simplest form. The first block is just setup, getting ready to send data to Google's server. In lines 7 and 8, you'll see the heart of it. These lines assign the GA tracking ID and then the action to log (a pageview).

I won't go into detail about how GA works, but suffice it to say that, along with the pageview data above, numerous other pieces of data might be sent to Google, including:

  • browser identification (Firefox, Chrome, etc.)
  • geographic data about the browser
  • referrer (the site that linked to this page, if the user followed a link to get here)

It's the last piece, the referrer, that spammers hope to inject into your stats.

How Does a Referrer Spammer Send Data to Your GA Account?

In the Javascript code above, the tracking ID tells GA on which website the pageview purportedly occurred.

There are at least three methods a spammer can send data to your GA account against your will:

  1. Insert the Javascript above into a page on his own web site and generate hits to that page,
  2. set up an automated bot to crawl your site, faking the referrer to show it came from his site, or
  3. set up a bot that sends data directly to Google, without involving a web site at all.

From this point on, we'll focus on #3, because it's the most effective and pervasive.

Injecting Data into Your GA Without a Website

The Javascript above instructs a web browser to send data to Google. But what is a web browser? It's just a computer program—an application—that talks to web servers and displays the results to a user as a web page. After every page loads, if you're using GA, the browser also sends data to Google. But any application can send that data to Google; there's nothing difficult about it, if you know how the requests are formed. So spammers set up automated applications (bots) that run on their own server and send out requests directly to GA. There's no web server or website involved at all. And because Google doesn't discriminate by the request sources, they're all treated as traffic on your site.

Here are the important points to remember:

  • Because of the way website and browsers work, all Javascript is viewable to the public, so your site's tracking ID is available for anyone who wants it.
  • Spammers can create automated scripts that send out thousands of hits to Google at very little cost (the cost of bandwidth only).
  • Spammers don't even need to take the time to research tracking IDs; they can just write a script that generates IDs randomly.
  • The automated requests don't send traffic to your server, so they don't affect your site's performance.
  • The only people who can see the results of this type of spamming are those who have access to your GA reports. So in this case, the spammers' target is you, not your site or your site's visitors.

Because spammers don't even need to send traffic to your site in order to inject referrals, the spam is called "ghost referrer spam".

Why Do They Do It?

If only those with access to your GA reports can see the hits, why do spammers go to all the trouble? It seems they want you, the site administrator, to see the link and click on it. They might be attempting to promote a site, or they might get revenue for each click to another site. If you're interested in how that works, there's an infographic from Wiyre that explains referrer revenue. (You'll notice that step 3 in the infographic mentions setting up a bot to crawl your website to generate the log data. That's really method #2 above, and it requires more work than is required. Replace that step with our siteless injection method.)

How Can You Prevent Ghost Referrer Spamming?

Since this particular type of spamming doesn't involve your site or server, there's really nothing you can do on that end to prevent it. The final solution will probably have to come from Google. They need to change Analytics to discern between legitimate traffic from a browser and fake traffic sent from a spam server.

In the mean time, the best solution is to create a hostname filter for each Analytics property (each website) you monitor. You'll probably want to work with an experienced developer, beause GA data filters are destructive. This means that, once you create a filter, it limits the data Google saves. If you set your filter too strictly, you might lose data forever. Since there isn't necessarily a one-size-fits-all solution, I'll just stick to basic recommendations for such a filter:

  • Set up the filter on a new view; leave the original view "unfiltered", in case you ever need to refer back to that data.
  • By default, a new view will begin collecting and saving data when you create it. So the filtered view will show reports beginning on the date you set it up.
  • The filter should be a combination of patterns that include all relevent forms of your legitimate primary domains, subdomains, and alternate domains you might employ (e.g. example.com, www.example.com, alternate-domain.com).
  • In setting up your filter, first scan your unfiltered GA hostname report (Audience > Technology > Network > Primary Dimension: Hostname) to see what, if any, 3rd party domains you might want to consider as legitimate traffic.
  • Review your hostname reports from time to time, comparing filtered and unfiltered to see whether you need to adjust the filter.

Filtered GA Reports

Below, you'll see a referral report from the same site as above, with the filter applied.

Lines 2 and 6 are legitimate traffic from sites that send traffic to us.

Lines 3, 4, 5, and 7 are legitimate traffic from search engines and directories.

The top referrer is still a spammer, but they haven't injected the hits directly to Google; they actually went to the trouble of creating a web page that links to my client's site, and they generated real traffic on the site. So I haven't filtered it out. In fact, since the traffic truly did happen on our site, the hostname filter wouldn't work, anyway. (Remember that the hostname filter isn't filtering the "referrer", but rather where the request to Google comes from. Yes, it's complicated.)

 

Summary

I hope I've managed to explain the issue at hand without getting overly techncal. The main takeaways are:

  • The problem is pervasive,
  • it can skew nearly all of your Analtyics reports in some way,
  • the best we can do now is filter most of the spammers, and
  • the finally solution is going to have to come from Google.