Scraping - How can I stop scrapers?

Jul 92007

published by Gary in Blogging with 14 Comments

Joost de Valk noticed, and kindly let me know, that another blog site (colesearchreports.com) is scraping all of Phoenixrealms content.

Example: colesearchreports.com/?p=18267

From the 10’s of thousands of posts they have accumulated in just 3 months, no doubt they are scraping a lot of other sites too.

I have never experienced/suffered from scraping before. If anyone has any ideas how I can stop this, any advice would be greatly appreciated.

Thanks,

Gary.

PS. Thanks Joost

AddThis Social Bookmark ButtonAddThis Feed Button

RSS feed | Trackback URI

14 responses to Scraping - How can I stop scrapers?

Comment by Joost de Valk Subscribed to comments via email
2007-07-10 06:41:00

Nofollow? :P but ehm seriously: email the owner a cease and desist to start with…

Comment by Gary
2007-07-10 10:25:44

You would think this would be a simple task, but no.

The whois does not reveal the site owner nor their contact details. I therefore contacted the hosting company with a polite cease and desist. I have yet to hear back from them, and if I am honest can I really expect to.

Still leaves me with my problem!!

 
 
Comment by SEO Carly Subscribed to comments via email
2007-07-10 10:49:34

Unfortunately it’s difficult, i mean you can block known scraper referers but they can block it etc.

Alot scrape your RSS feeds, so Nofollow all your RSS links and block them with Robots.txt

Comment by Gary
2007-07-10 11:04:41

Hi Carly, I have nofollowed all the RSS links.
Can you give an example of how to block using a robots.txt file?
Thanks, Gary.

 
 
Comment by SEO Carly Subscribed to comments via email
2007-07-10 11:28:40

Hello Gary,

I just took a look and noticed your using Feedburner, i’m not 100% familiar with using it but being on an external domain i think you can only Nofollow them.

Actually come to think of it i think there’s a “noindex” feature in Feedburner under the Publicize section, this would be akin to using Robots.txt

It’s good to do regardless due to duplicate content issues, and the possibility of the feed appearing in the serp’s above the actual article page etc.

That’s how alot of these scrapers work, they grab the RSS feeds on Auto-pilot then post it to their blogs. Not many actually spider your regular page (too much “noise” text such as menu elements, advertising, latest post links etc)

I’d say the guy doesn’t even check the blog unless the Adsense eranings dissapear. After having another look at their site, i’d bet on the fact he’s pulling your contant via RSS.

Hope this helps.

Comment by Gary
2007-07-10 11:38:37

I have now set feedburner to nofollow. Thanks for pointing that out.

I will monitor my post over the next couple of weeks and see if that stops the scraping.

Cheers for your help, Gary.

 
 
Comment by Bespeckled SEO Subscribed to comments via email
2007-07-10 18:21:56

Gary,

How were you able to detect this? Are there any symptoms that one should look for?

David

Comment by Joost de Valk Subscribed to comments via email
2007-07-10 18:25:56

David: running a google alert on your name and blog name is usually the easiest way to find stuff like this.

 
Comment by Gary
2007-07-10 18:27:27

Hi David, Joost was the man that spotted my content was being scraped. And, SEO Carly subsequently helped me to hopefully cure the problem.
I am unsure how Joost originally detected the scraping.
Hopefully Joost may be able let you know.
In fact there he is now :-)

Comment by Joost de Valk Subscribed to comments via email
2007-07-10 18:29:46

In this case it was a vanity google alert for my own name :)

Comment by Gary
2007-07-10 18:33:33

Ahah, I understand now.

I am vain as well, as I also use Google in this way for both my name and my company name.

Thanks for originally picking this up for me Joost.

 
 
 
 
Comment by Bespeckled SEO Subscribed to comments via email
2007-07-10 18:34:25

Thanks, I’ll check it out. Good eyes, Joost!

 
Comment by Gary
2007-07-29 12:12:38

Scott at SEOmoz posted this helpful article on cease and desist

http://www.seomoz.org/blog/whiteboard-friday-whos-house-moz-house

 
2008-01-11 15:30:13

[…] in July 2007 I posted “Scraping - How can I stop scrapers?” Joost had noticed that content on Phoenixrealm was being scraped and kindly let me […]

 

Leave a reply


You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> in your comment.

pata canapy decameron hunslet allport coccobacilli sunsource sachsen cond nextdoornancy hillblogger yomammajokes absenteeism watters acsair distill fishmonger dragonshard rockstargames statics yahoogame wdc clavos handicapping tevion tyranids msvcr71-dll sherpani detecto gofish boatrader deform v-belt billard juneau bdi charlestown mendelssohn bigboobdreams chota obligation truckaddons yugio yourmp3 shoes greatness mercutio minitokyo 727 availity flourite ampeg harrahs tesla diplopia tartrazine pronunciation angylina stull apartmanprag williamsburg papst mincemeat joecartoons vetcom rosanne underground-love casuarina cricketarchive scorpian ccdc mammoths wordsearches collezione psas enclosure hue cartons moods ties kman colloseum msjava-dll notthecenter hibs funnybase telenovelas namsa supportability abnamro chaine krak a-data castanet micelles werbezuckerwatte waste darfour comedian bioms sveikinimai kn diashow corsica hospedaje activite schema fsecure ascona viscometer dcfc mchc drumset fiducial bojangle jitterbug truevoyeur natura flexonline vella insound denofsin ansul pac-fab canam kragen slither sundews innatists kokeshi straitjacket rabies scarabs dinamalar kathy febreeze winantiviruspro snmp-exe ushra incinerator abril wipers lapinator sorbitol milosovici scholorship 238 slice silblade bassbarn caligari lysa comercial dragonflyer inhalers whitehouseblackmarket springport blanch ivanisevic runyon trimethobenzamide maya ebalmsworld tetnis[Mon, 12 May 08 02:00:09 +0000]