Scraping - How can I stop scrapers?

Jul 92007

published by Gary in Blogging with 14 Comments

Joost de Valk noticed, and kindly let me know, that another blog site (colesearchreports.com) is scraping all of Phoenixrealms content.

Example: colesearchreports.com/?p=18267

From the 10’s of thousands of posts they have accumulated in just 3 months, no doubt they are scraping a lot of other sites too.

I have never experienced/suffered from scraping before. If anyone has any ideas how I can stop this, any advice would be greatly appreciated.

Thanks,

Gary.

PS. Thanks Joost

AddThis Social Bookmark ButtonAddThis Feed Button

RSS feed | Trackback URI

14 responses to Scraping - How can I stop scrapers?

Comment by Joost de Valk Subscribed to comments via email
2007-07-10 06:41:00

Nofollow? :P but ehm seriously: email the owner a cease and desist to start with…

Comment by Gary
2007-07-10 10:25:44

You would think this would be a simple task, but no.

The whois does not reveal the site owner nor their contact details. I therefore contacted the hosting company with a polite cease and desist. I have yet to hear back from them, and if I am honest can I really expect to.

Still leaves me with my problem!!

 
 
Comment by SEO Carly Subscribed to comments via email
2007-07-10 10:49:34

Unfortunately it’s difficult, i mean you can block known scraper referers but they can block it etc.

Alot scrape your RSS feeds, so Nofollow all your RSS links and block them with Robots.txt

Comment by Gary
2007-07-10 11:04:41

Hi Carly, I have nofollowed all the RSS links.
Can you give an example of how to block using a robots.txt file?
Thanks, Gary.

 
 
Comment by SEO Carly Subscribed to comments via email
2007-07-10 11:28:40

Hello Gary,

I just took a look and noticed your using Feedburner, i’m not 100% familiar with using it but being on an external domain i think you can only Nofollow them.

Actually come to think of it i think there’s a “noindex” feature in Feedburner under the Publicize section, this would be akin to using Robots.txt

It’s good to do regardless due to duplicate content issues, and the possibility of the feed appearing in the serp’s above the actual article page etc.

That’s how alot of these scrapers work, they grab the RSS feeds on Auto-pilot then post it to their blogs. Not many actually spider your regular page (too much “noise” text such as menu elements, advertising, latest post links etc)

I’d say the guy doesn’t even check the blog unless the Adsense eranings dissapear. After having another look at their site, i’d bet on the fact he’s pulling your contant via RSS.

Hope this helps.

Comment by Gary
2007-07-10 11:38:37

I have now set feedburner to nofollow. Thanks for pointing that out.

I will monitor my post over the next couple of weeks and see if that stops the scraping.

Cheers for your help, Gary.

 
 
Comment by Bespeckled SEO Subscribed to comments via email
2007-07-10 18:21:56

Gary,

How were you able to detect this? Are there any symptoms that one should look for?

David

Comment by Joost de Valk Subscribed to comments via email
2007-07-10 18:25:56

David: running a google alert on your name and blog name is usually the easiest way to find stuff like this.

 
Comment by Gary
2007-07-10 18:27:27

Hi David, Joost was the man that spotted my content was being scraped. And, SEO Carly subsequently helped me to hopefully cure the problem.
I am unsure how Joost originally detected the scraping.
Hopefully Joost may be able let you know.
In fact there he is now :-)

Comment by Joost de Valk Subscribed to comments via email
2007-07-10 18:29:46

In this case it was a vanity google alert for my own name :)

Comment by Gary
2007-07-10 18:33:33

Ahah, I understand now.

I am vain as well, as I also use Google in this way for both my name and my company name.

Thanks for originally picking this up for me Joost.

 
 
 
 
Comment by Bespeckled SEO Subscribed to comments via email
2007-07-10 18:34:25

Thanks, I’ll check it out. Good eyes, Joost!

 
Comment by Gary
2007-07-29 12:12:38

Scott at SEOmoz posted this helpful article on cease and desist

http://www.seomoz.org/blog/whiteboard-friday-whos-house-moz-house

 
2008-01-11 15:30:13

[…] in July 2007 I posted “Scraping - How can I stop scrapers?” Joost had noticed that content on Phoenixrealm was being scraped and kindly let me […]

 

Leave a reply


You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> in your comment.

hafele byron quietcool extravert universalis p-o-d- mentawais bangerbus matty anastasiaweb duy shirinlou minidisc bottom podcasts maledemon qmi romreactor trachtenberg hmi bcbs race norm gunbroker grand jimenez verisonwireless favicon nvq pink mennonites kalamaki jigalo koin sneaking pact twyla bomstein polystyrene grec article ong vigo equisearch azasuke streams aventis w32dasm fmf free2peek forcloser woodcrafter stains gerling bwb username angeleyes camping dunno cheatcods langeria vxi etrex rhododendrums marmosets tn5250 talk b-o-y-s- kulula dire bugbear reiko badtree icepick quesadillas stridor formulary mele gamegarage kittyhawk cbssportline azurues chengde houston cecos amiga indigenous mackintosh amcom electroline noxylane aardvarks danio carmasutra kigyos ki-67 rushdie afronova nabster comar dvx100a kpli govsurplus mason hospices tamatown rafferty carlile rj11 constructor peyronies patterson ladder insonorizzazioni cuddies asbo droplets campworld kerria tsali priapus lenguaje epicourious sudoka nolitours afn wgme takahe debs yaoo federer eri harrp newsbin slainte ms-13 wingmakers rationalization sonoluminescence shifter patriarchate mexican headphone crabtree retina primamerica 2torrents podpsp wirral naps mediadomain legisi aergrid handjo tervis tuning[Sat, 05 Jul 08 22:00:14 +0000]