Scraping - How can I stop scrapers?
Joost de Valk noticed, and kindly let me know, that another blog site (colesearchreports.com) is scraping all of Phoenixrealms content.
Example: colesearchreports.com/?p=18267
From the 10’s of thousands of posts they have accumulated in just 3 months, no doubt they are scraping a lot of other sites too.
I have never experienced/suffered from scraping before. If anyone has any ideas how I can stop this, any advice would be greatly appreciated.
Thanks,
Gary.
PS. Thanks Joost
Today is May 12, 2008
Is your SEO strategy up to date?
The only current SEO book on the planet.
Buy the industry standard #1 ranked SEO Book. But don't take my word for it...
This is the #1 ranking SEO book on all major search engines.
The search engines love it. Thousands of webmasters have build their business using SEO Book. Learn why today.





Nofollow?
but ehm seriously: email the owner a cease and desist to start with…
You would think this would be a simple task, but no.
The whois does not reveal the site owner nor their contact details. I therefore contacted the hosting company with a polite cease and desist. I have yet to hear back from them, and if I am honest can I really expect to.
Still leaves me with my problem!!
Unfortunately it’s difficult, i mean you can block known scraper referers but they can block it etc.
Alot scrape your RSS feeds, so Nofollow all your RSS links and block them with Robots.txt
Hi Carly, I have nofollowed all the RSS links.
Can you give an example of how to block using a robots.txt file?
Thanks, Gary.
Hello Gary,
I just took a look and noticed your using Feedburner, i’m not 100% familiar with using it but being on an external domain i think you can only Nofollow them.
Actually come to think of it i think there’s a “noindex” feature in Feedburner under the Publicize section, this would be akin to using Robots.txt
It’s good to do regardless due to duplicate content issues, and the possibility of the feed appearing in the serp’s above the actual article page etc.
That’s how alot of these scrapers work, they grab the RSS feeds on Auto-pilot then post it to their blogs. Not many actually spider your regular page (too much “noise” text such as menu elements, advertising, latest post links etc)
I’d say the guy doesn’t even check the blog unless the Adsense eranings dissapear. After having another look at their site, i’d bet on the fact he’s pulling your contant via RSS.
Hope this helps.
I have now set feedburner to nofollow. Thanks for pointing that out.
I will monitor my post over the next couple of weeks and see if that stops the scraping.
Cheers for your help, Gary.
Gary,
How were you able to detect this? Are there any symptoms that one should look for?
David
David: running a google alert on your name and blog name is usually the easiest way to find stuff like this.
Hi David, Joost was the man that spotted my content was being scraped. And, SEO Carly subsequently helped me to hopefully cure the problem.
I am unsure how Joost originally detected the scraping.
Hopefully Joost may be able let you know.
In fact there he is now
In this case it was a vanity google alert for my own name
Ahah, I understand now.
I am vain as well, as I also use Google in this way for both my name and my company name.
Thanks for originally picking this up for me Joost.
Thanks, I’ll check it out. Good eyes, Joost!
Scott at SEOmoz posted this helpful article on cease and desist
http://www.seomoz.org/blog/whiteboard-friday-whos-house-moz-house
[…] in July 2007 I posted “Scraping - How can I stop scrapers?” Joost had noticed that content on Phoenixrealm was being scraped and kindly let me […]