Scraping – How can I stop scrapers?

by Gary on July 9, 2007 · 15 comments

in Blogging

Joost de Valk noticed, and kindly let me know, that another blog site ( is scraping all of Phoenixrealms content.


From the 10’s of thousands of posts they have accumulated in just 3 months, no doubt they are scraping a lot of other sites too.

I have never experienced/suffered from scraping before. If anyone has any ideas how I can stop this, any advice would be greatly appreciated.



PS. Thanks Joost

Share This Post

Joost de Valk July 10, 2007 at 6:41 am

Nofollow? 😛 but ehm seriously: email the owner a cease and desist to start with…

Gary July 10, 2007 at 10:25 am

You would think this would be a simple task, but no.

The whois does not reveal the site owner nor their contact details. I therefore contacted the hosting company with a polite cease and desist. I have yet to hear back from them, and if I am honest can I really expect to.

Still leaves me with my problem!!

SEO Carly July 10, 2007 at 10:49 am

Unfortunately it’s difficult, i mean you can block known scraper referers but they can block it etc.

Alot scrape your RSS feeds, so Nofollow all your RSS links and block them with Robots.txt

Gary July 10, 2007 at 11:04 am

Hi Carly, I have nofollowed all the RSS links.
Can you give an example of how to block using a robots.txt file?
Thanks, Gary.

SEO Carly July 10, 2007 at 11:28 am

Hello Gary,

I just took a look and noticed your using Feedburner, i’m not 100% familiar with using it but being on an external domain i think you can only Nofollow them.

Actually come to think of it i think there’s a “noindex” feature in Feedburner under the Publicize section, this would be akin to using Robots.txt

It’s good to do regardless due to duplicate content issues, and the possibility of the feed appearing in the serp’s above the actual article page etc.

That’s how alot of these scrapers work, they grab the RSS feeds on Auto-pilot then post it to their blogs. Not many actually spider your regular page (too much “noise” text such as menu elements, advertising, latest post links etc)

I’d say the guy doesn’t even check the blog unless the Adsense eranings dissapear. After having another look at their site, i’d bet on the fact he’s pulling your contant via RSS.

Hope this helps.

Gary July 10, 2007 at 11:38 am

I have now set feedburner to nofollow. Thanks for pointing that out.

I will monitor my post over the next couple of weeks and see if that stops the scraping.

Cheers for your help, Gary.

Bespeckled SEO July 10, 2007 at 6:21 pm


How were you able to detect this? Are there any symptoms that one should look for?


Joost de Valk July 10, 2007 at 6:25 pm

David: running a google alert on your name and blog name is usually the easiest way to find stuff like this.

Gary July 10, 2007 at 6:27 pm

Hi David, Joost was the man that spotted my content was being scraped. And, SEO Carly subsequently helped me to hopefully cure the problem.
I am unsure how Joost originally detected the scraping.
Hopefully Joost may be able let you know.
In fact there he is now 🙂

Joost de Valk July 10, 2007 at 6:29 pm

In this case it was a vanity google alert for my own name 🙂

Gary July 10, 2007 at 6:33 pm

Ahah, I understand now.

I am vain as well, as I also use Google in this way for both my name and my company name.

Thanks for originally picking this up for me Joost.

Bespeckled SEO July 10, 2007 at 6:34 pm

Thanks, I’ll check it out. Good eyes, Joost!

Gary July 29, 2007 at 12:12 pm

Scott at SEOmoz posted this helpful article on cease and desist

Bruce Simmons (BruSimm) July 12, 2010 at 9:20 pm

Something to ponder… not all your posts may show up in your automatic Google Searches for your name or Site Name. They replace by “by Author” section with their adsense ads. Has anyone contacted Google about the scrapers running adsense ads on content other than their own?

Me, I do a random search on my opening phrases in quotes and that pulls up the scrapers pretty quickly. Then you can search their site for your other content. Heck, one site even made a tag out of my site name. dUh!

Great article and comments too. Says a lot for the readership.