There was recently a frantic post on the Google groups by a gentleman who was sure that his entire website was de-indexed by Google because another domain had a cached version of it indexed. After he saw what had happened he researched the matter himself and assumed that he had been hijacked by this proxy cache and that he needed to take action to block any further problems. His response was to block all robots to his site with nofollow and noindex meta tags which only made matters worse. His actions caused his entire site of 4000+ pages to be removed from all search engine indexes and destroyed his business.
Of course this example is a bit extreme, but would your response have been any better? It’s time we educated ourselves about the mystery behind the dreaded duplicate content matter and learned how to really deal with it.
By basic definition, duplicate content refers to an exact copy of webpage or content on a page that is listed under a different URL. Meaning that the pages look exactly the same but the URL in the address bar is different. This could either be internally (within your site) or externally (on another website). For today, we are going to stick with external duplicate content since this is what is described in the example.
But first, before we begin, we should look at why we are concerned about what other people do online with our content. What caused this whole duplicate content beast to appear anyway? The true cause of the fear of duplicate content was Google’s supplemental index (which is now gone). The problem was that Google wanted to find a way to limit the number of results from a single site about a single keyword. For example if you had a page about green tea on your site and you also had ten copies of the page under different categories still on the same site Google had to pick one of them so your single site did not take up multiple spots in the rankings. These duplicate pages were placed in supplemental index to show the owners that Google knew the page was there, but didn’t want to put it in the search results because either the page itself or something very similar was already there.
Many site owners had problems with this because they did not have enough unique pages. Simply replacing green tea with white tea did not make a page unique enough to be listed as a different page. Pages needed to be clearly different with different, text to be unique, but no one knew. And so the dreaded duplicate content page missing issue began. The beast had been born.
So how does external duplicate content actually affect your site? The truth of the matter is that it doesn’t affect it at all. The stories we hear of cached versions of pages replacing the real site all have underlining nonrelated problems that we never hear of. If for example you were caught and deindexed for taking part in a link farm, it’s only natural that a copy of your site takes its place. It’s still your site and still your content it’s just listed somewhere else on the internet that’s not in trouble with the search engine.
If we really take the time to think about this whole issue of external duplicate content before we panic and make matters worse, we can see just how unfounded it really is. Could it really be so simple to destroy your competitors that all you needed to do was make a copy of their site? Heck, even multiple copies of a website could be done with just a few dollars. The internet would be in total anarchy as site after site would compete in terms of who could copy each other the most. Major sites like WhiteHouse.gov could be removed from Google because of the actions of the average middle school child with internet access and fifty bucks. Do we really want to think this is how the internet works?
In the end, we should actually consider these duplicate external websites and caches to be a good thing. If by some off-chance some user finds a cache version of your site online in the farthest reaches of the internet, it will still have your content on it and your contact information. This copy somehow could reach a user that in a million years could have not found your real site for some reason or another. Right now your articles and products could be being viewed by people you never even thought of targeting. This is a good thing for your business and your website. Some of these random cached pages might even be considered backlinks. Albeit this is a far fetched notion, but it is very possible.
I hope this has somehow cleared the air around the notion of external duplicate content and that you may feel more at ease when you see copy of your page somewhere online. It won’t hurt you or your SEO practices; all it can do is help spread your content. Remember copying is considered the most sincere form of flattery.
Many web site owners would really love to have a mobile version of their web site available. Not only is it just cool and convenient to browse for the latest information while on the go, it is also becoming very popular. Soon most web sites will have a mobile version. If you can’t offer this service you may lose visitors in the future.Now is a good time to start learning about the technologies involved and the issues you may run into while deploying a mobile web site.
Most mobile enabled sites are using syndicated versions of their standard site. This could create some duplicate content issues which should be avoided. I’m in the process of learning new ways to fix this issue right now. One way is to make sure that your site is being crawled by the engines correctly. The standard SERPs should only index your standard content and the Mobile SERPs should only index your mobile content.
You have probably noticed that the big players in search are all starting to offer mobile versions of their search engines which are designed to index mobile content, this means that there will probably be a mobile SERP for each engine.
With a bit of research on Google’s site, I found a list of their user-agents which can be used in your sites robots.txt file. If I had my mobile content serving out of www.mysite.com/mobile and wanted the site to get indexed correctly I would add something like this to my robots.txt
This would tell the standard Google SERP to not index the content in “/mobile” or any of it’s sub-directories. Then the next two rules would tell the mobile Google SERP to index everything in the “/mobile” directory and ignore everything else.
This is just one way to avoid the issue, There is also the .mobi domain which is to be used only for mobile content. I will discuss this more on another day. For further reading on mobile site development and the DotMobi domain, I would recommend this awesome guide which is in PDF format.
Duplicate content is a hot topic and has been for quite a while. It is also one of the most misunderstood issues in search engine optimization. Many webmasters and even some search marketers spend an extraordinary amount of time and resources trying to avoid the dreaded “Duplicate Content Penalty”, when in fact a penalty derived from duplicate content is fairly rare and reserved specifically for sites which have been observed trying to manipulate search engine rankings directly; i.e. search engine spammers.
The more common issue associated with duplicate content found by search engines is the “Duplicate Content Filter”. When a search engine finds two or more pages with identical or even nearly identical content it applies a filter which allows only one instance of the content to be returned in search results. This is in no way a penalty and does not affect the site in whole, just the specific page as it relates to the specific search query. The goal of the search engines is to provide their users with “unique” content and this filter helps to ensure each page returned in the search results is unique.
In the past couple of weeks Google has published an article with some very specific information on how it sees and handles duplicate content as well as some bullet points on issues to watch for concerning duplicate content. Additionally, another new US Patent relating to identifying and handling duplicate content has been granted to Google.Read More