Scraping – How can I stop scrapers?

by Gary on July 9, 2007 · 15 comments

in Blogging

Joost de Valk noticed, and kindly let me know, that another blog site (colesearchreports.com) is scraping all of Phoenixrealms content.

Example: colesearchreports.com/?p=18267

From the 10′s of thousands of posts they have accumulated in just 3 months, no doubt they are scraping a lot of other sites too.

I have never experienced/suffered from scraping before. If anyone has any ideas how I can stop this, any advice would be greatly appreciated.

Thanks,

Gary.

PS. Thanks Joost

Share This Post

Advertise here
  • Joost de Valk

    Nofollow? :P but ehm seriously: email the owner a cease and desist to start with…

    • Gary

      You would think this would be a simple task, but no.

      The whois does not reveal the site owner nor their contact details. I therefore contacted the hosting company with a polite cease and desist. I have yet to hear back from them, and if I am honest can I really expect to.

      Still leaves me with my problem!!

  • SEO Carly

    Unfortunately it’s difficult, i mean you can block known scraper referers but they can block it etc.

    Alot scrape your RSS feeds, so Nofollow all your RSS links and block them with Robots.txt

    • Gary

      Hi Carly, I have nofollowed all the RSS links.
      Can you give an example of how to block using a robots.txt file?
      Thanks, Gary.

  • SEO Carly

    Hello Gary,

    I just took a look and noticed your using Feedburner, i’m not 100% familiar with using it but being on an external domain i think you can only Nofollow them.

    Actually come to think of it i think there’s a “noindex” feature in Feedburner under the Publicize section, this would be akin to using Robots.txt

    It’s good to do regardless due to duplicate content issues, and the possibility of the feed appearing in the serp’s above the actual article page etc.

    That’s how alot of these scrapers work, they grab the RSS feeds on Auto-pilot then post it to their blogs. Not many actually spider your regular page (too much “noise” text such as menu elements, advertising, latest post links etc)

    I’d say the guy doesn’t even check the blog unless the Adsense eranings dissapear. After having another look at their site, i’d bet on the fact he’s pulling your contant via RSS.

    Hope this helps.

    • Gary

      I have now set feedburner to nofollow. Thanks for pointing that out.

      I will monitor my post over the next couple of weeks and see if that stops the scraping.

      Cheers for your help, Gary.

  • Bespeckled SEO

    Gary,

    How were you able to detect this? Are there any symptoms that one should look for?

    David

    • Joost de Valk

      David: running a google alert on your name and blog name is usually the easiest way to find stuff like this.

    • Gary

      Hi David, Joost was the man that spotted my content was being scraped. And, SEO Carly subsequently helped me to hopefully cure the problem.
      I am unsure how Joost originally detected the scraping.
      Hopefully Joost may be able let you know.
      In fact there he is now :-)

      • Joost de Valk

        In this case it was a vanity google alert for my own name :)

        • Gary

          Ahah, I understand now.

          I am vain as well, as I also use Google in this way for both my name and my company name.

          Thanks for originally picking this up for me Joost.

  • Bespeckled SEO

    Thanks, I’ll check it out. Good eyes, Joost!

  • Gary

    Scott at SEOmoz posted this helpful article on cease and desist

    http://www.seomoz.org/blog/whiteboard-friday-whos-house-moz-house

  • Pingback: Beating the scrapers » SEO Blog

  • Bruce Simmons (BruSimm)

    Something to ponder… not all your posts may show up in your automatic Google Searches for your name or Site Name. They replace by “by Author” section with their adsense ads. Has anyone contacted Google about the scrapers running adsense ads on content other than their own?

    Me, I do a random search on my opening phrases in quotes and that pulls up the scrapers pretty quickly. Then you can search their site for your other content. Heck, one site even made a tag out of my site name. dUh!

    Great article and comments too. Says a lot for the readership.