There was recently a frantic post on the Google groups by a gentleman who was sure that his entire website was de-indexed by Google because another domain had a cached version of it indexed. After he saw what had happened he researched the matter himself and assumed that he had been hijacked by this proxy cache and that he needed to take action to block any further problems. His response was to block all robots to his site with nofollow and noindex meta tags which only made matters worse. His actions caused his entire site of 4000+ pages to be removed from all search engine indexes and destroyed his business.
Of course this example is a bit extreme, but would your response have been any better? It’s time we educated ourselves about the mystery behind the dreaded duplicate content matter and learned how to really deal with it.
By basic definition, duplicate content refers to an exact copy of webpage or content on a page that is listed under a different URL. Meaning that the pages look exactly the same but the URL in the address bar is different. This could either be internally (within your site) or externally (on another website). For today, we are going to stick with external duplicate content since this is what is described in the example.
But first, before we begin, we should look at why we are concerned about what other people do online with our content. What caused this whole duplicate content beast to appear anyway? The true cause of the fear of duplicate content was Google’s supplemental index (which is now gone). The problem was that Google wanted to find a way to limit the number of results from a single site about a single keyword. For example if you had a page about green tea on your site and you also had ten copies of the page under different categories still on the same site Google had to pick one of them so your single site did not take up multiple spots in the rankings. These duplicate pages were placed in supplemental index to show the owners that Google knew the page was there, but didn’t want to put it in the search results because either the page itself or something very similar was already there.
Many site owners had problems with this because they did not have enough unique pages. Simply replacing green tea with white tea did not make a page unique enough to be listed as a different page. Pages needed to be clearly different with different, text to be unique, but no one knew. And so the dreaded duplicate content page missing issue began. The beast had been born.
So how does external duplicate content actually affect your site? The truth of the matter is that it doesn’t affect it at all. The stories we hear of cached versions of pages replacing the real site all have underlining nonrelated problems that we never hear of. If for example you were caught and deindexed for taking part in a link farm, it’s only natural that a copy of your site takes its place. It’s still your site and still your content it’s just listed somewhere else on the internet that’s not in trouble with the search engine.
If we really take the time to think about this whole issue of external duplicate content before we panic and make matters worse, we can see just how unfounded it really is. Could it really be so simple to destroy your competitors that all you needed to do was make a copy of their site? Heck, even multiple copies of a website could be done with just a few dollars. The internet would be in total anarchy as site after site would compete in terms of who could copy each other the most. Major sites like WhiteHouse.gov could be removed from Google because of the actions of the average middle school child with internet access and fifty bucks. Do we really want to think this is how the internet works?
In the end, we should actually consider these duplicate external websites and caches to be a good thing. If by some off-chance some user finds a cache version of your site online in the farthest reaches of the internet, it will still have your content on it and your contact information. This copy somehow could reach a user that in a million years could have not found your real site for some reason or another. Right now your articles and products could be being viewed by people you never even thought of targeting. This is a good thing for your business and your website. Some of these random cached pages might even be considered backlinks. Albeit this is a far fetched notion, but it is very possible.
I hope this has somehow cleared the air around the notion of external duplicate content and that you may feel more at ease when you see copy of your page somewhere online. It won’t hurt you or your SEO practices; all it can do is help spread your content. Remember copying is considered the most sincere form of flattery.