Scam HTTP_REFERER problem

My wife has a small blog for her hobby. I am, for my sins, her IT guy for it. She posts to it about once a week, maybe twice, and of course, after every post there’s a small uptick in views. All nice and normal.

About a couple of months ago, I began to notice something weird happening. There was one post in particular that was getting 50-75% of all the views per day. There was nothing too striking about the post itself, but we laughed it off, imagining that it was obviously remarkable enough that it still attracted new views (the way the blog engine works is that it posts a cookie when you visit a post for the first time so that your subsequent view(s) don’t get registered again). My reasoning after a little while was that there was some heavily frequented page out there (my guess: Pinterest) that had a link to this particular post and people were intrigued enough to click the link. In droves.

And the views kept on piling up for this post. At the time of writing, this post has 4 times as many views as the next most popular post on her blog and 10 times as many as the number of views for the “About This Site” page.

Finally, I decided to write a bit of code to log the HTTP_REFERER for visits to the blog. Oh my.

HTTP_REFERER scam

Scam alert

This is a quick screenshot from SQL Server Management Studio for this log file from the site, for this blog post, showing a very dubious set of HTTP_REFERER values. Over Thanksgiving. Every one of them hits the page three times, in very quick succession, obviously discarding the cookie. In fact, this is scripting at work; it’s not like there’s some dumbass clicking on a link. I visited a couple of these sites (in a very locked down, user-incognito browser) and most of them look like article and post aggregators.

Yes, agreed, HTTP_REFERER is pretty much useless these days. What with Google encrypting search terms for privacy reasons, and many sites/browsers not even using it, and it being way too easy to fake (like these sites), it’s ultimately discardable and dodgy information.

I had some fun trying to work out why the hell this was happening, and the closest I could come to for a valid reason is … Google Analytics and dodgy SEO tricks. Imagine you have a site and you’re using GA. Through the JavaScript on the page, it will collect these referrer URLs for you and present them in its analysis. You look at the chart and wonder why you’re getting clicks from, say, hotenergy.ru and click through from GA. Google records the click: SEO in action! hotenergy.ru gets a tiny bit of Google love. Repeat over many other sites using GA and hotenergy.ru rises in Google rank.

At least that’s my thought on it. I wonder if my readers have any other insight. For now though, I’m coding that particular page to 404 if the HTTP_REFERER is a root page on whatever domain. I did think of just changing the URL of the page itself, but that would cause link rot on legitimate sites linking to hers. I have already removed the GA JavaScript on the page. I shall be checking my other sites too.

Album cover for SkyscrapingNow playing:
ABC - Skyscraping
(from Skyscraping)


 

My head hurts - banner

Loading similar posts...   Loading links to posts on similar topics...

4 Responses

 avatar
#1 buanzo said...
26-Nov-16 4:58 AM

It is called referer spam.....https://en.m.wikipedia.org/wiki/Referer_spam

julian m bucknall avatar
#2 julian m bucknall said...
29-Nov-16 6:24 PM

@buanzo: Yep, I'd read that wikipedia article on Referer Spam before writing my post here. It reads a bit out of date:

"Sites that publish their access logs" -- nope

"The [Google Analytics] technique is used to have the spammers' URLs appear in the site statistics, inducing the site owner to visit the spam URLs" -- nope. Even when I wrote this, but especially now I've turned it off.

In essence, I'm going to 404 certain referrers. Or even better, 301 them back to themselves.

Cheers, Julian

julian m bucknall avatar
#3 julian m bucknall said...
04-Dec-16 11:30 AM

...And the hits just keep on coming. I copied the changes described above to this site. For some unknown reason, my rather dry academic article on "Nasty ABA problem in array-based lock-free stack" is getting regular hits from very dodgy porn sites (as in, the URLs themselves just sound like ultra-dodgy porn and there's no way I'm actually visiting them).

Ha!

Cheers, Julian

julian m bucknall avatar
#4 julian m bucknall said...
05-Dec-16 4:12 PM

Further logging shows that the other blog post that gets these scammy porn referrer hits is Kindle Fire HD with the Think Outside Stowaway Bluetooth Keyboard.

Wonders will never cease.

Cheers, Julian

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

  •  Emphasize with italics: surround word with underscores _emphasis_
  •  Emphasize strongly: surround word with double-asterisks **strong**
  •  Link: surround text with square brackets, url with parentheses [text](url)
  •  Inline code: surround text with backticks `IEnumerable`
  •  Unordered list: start each line with an asterisk, space * an item
  •  Ordered list: start each line with a digit, period, space 1. an item
  •  Insert code block: start each line with four spaces
  •  Insert blockquote: start each line with right-angle-bracket, space > Now is the time...
Preview of response