Scrapers and the Duplicate Issue

by Gary on August 7, 2008 · 11 comments

in SEO

We’ve all been warned again and again about the problems that duplicate can bring. However, the truth is that nowadays it is not really such a big deal anymore. This doesn’t mean that you should start plagiarising other people’s stuff and simply offer duplicate content on your site. What this does mean is that you shouldn’t stress out whenever scrapers decide to “syndicate” your content for you. The reason why I say this is that Google said so!

According to Sven Naumann “you shouldn’t be very concerned about seeing negative effects on your site’s presence on Google if you notice someone scraping your content” because Google is pretty good at determining which site is the original publisher of the content. Your site will NOT be penalised for the scraper’s work. Duplicate content WILL NOT lead to lower rankings for your website. The worst it can do for you is to have that section containing the duplicate content filtered out (meaning it wouldn’t affect your rank negatively nor positively). If you are recognised as the original publisher of the content though this wouldn’t even happen. To make sure that you get credit for the original work, make sure you only syndicate to websites that not only will give you a byline but will also link to your site. As for the scrapers don’t even think about them, again, Google says they’re good at figuring out which is which.

Share This Post

John Sylvester August 20, 2008 at 5:37 am

I can confirm this truth as one of my clients recently had an issue with a rival firm publishing duplicate content and conversely a real estate project that was duplicated on their site.

The first instance, Google took an age to issue the DMCA but it did for four of the stolen pages. Yahoo did nothing and refused to enter into dialogue.

The websites that did duplicated this content are six keyworded domains with exactly the same content and all linked together. My client complained about this also but Google took no action.

The project on the other hand had admittedly only the home page copy altered (by the way, this duplication was sanctioned by their client) but Google de-indexed the page.

With the rise of RSS and syndication in general, duplicate content abounds. Except we are now developing a free spinning tool to get round a lot of the press releases and article we write.

Lagerhall maskinhall August 21, 2008 at 5:07 am

happy that google have solved that issue. Can be a pain in tha a** when some one copys your site.

Stu August 28, 2008 at 11:16 am

Actually, in my experience, google has no idea who published the content first.

The only way that they can judge is by which page they index first.

Time and time again I’ve seen instances where the page with the most/best backlinks win, regardless of who actually published the content first.

Goran September 2, 2008 at 2:50 pm

I’ll get myself this book and it will be my SEO bible.

Sindy September 6, 2008 at 11:37 am

It is awfull when one duplicates your content. I’ve heard there is a site that finds out is this content plagiazed or not but forgot it (: Does anybody knows?? I used it some months ago but didn’t copy its link unfortunately

Gary September 6, 2008 at 2:54 pm

Vaibhav October 2, 2008 at 8:18 pm

We recently carried out some experiments to come to the conclusion that around 35 % uniqueness in the articles is good enough for google to consider your article as unique..

Goran October 9, 2008 at 9:03 am

i don’t see how Google can track which is the original content. If I have a small site dont update regularily and it only gets indexed every 3 weeks and someone whos website gets updated daily gets a copy of mine the search engine will not know that mine was first.

Its the same as me writing a paper and not handing it in and someone copies it and I hand it in later, well obviously mine is the not the first one. Granted someone can look at my work but with the web we could have 10 different writers who work for us and for other clients.

I dont get how they do it, and as far as I am concerned I will protect my content as best as possible.

Would love to here what you think of my thoughts.

Gary October 10, 2008 at 8:39 am

Hi Goran, like you I protect my content as best as possible. For instance is protected by Copyscape, and I regularly issue DMCAs.

Mainly scrapers seem to have very low quality sites with little authority, often without contact details, and even if they do they rarely comply with a DMCA.

Sites that duplicate my content without credit, and have authority usually acknowledge and respond to a DMCA.

maleos December 4, 2008 at 12:41 pm

can you describe how exactly google do this? is by date or something similar maybe?


Gideon van Oudtshoorn March 26, 2010 at 11:26 am

Duplicate content is a myth. How is it that the same article can be posted on 100 articles sites and none of them compromised by its use?