In today’s world, duplicate content can lead to search engines becoming confused about the correct page to return to a user. We want to make the job of a search engine easier by clearly focusing them on the right pages to return to users through proper canonicalization. By properly employing canonical tags to your webpages, you can take advantage of having a robust site while following a Best Practices approach to Search Engine Optimization.
Last week, in a rare unified move, all three major search engines announced support for a new “canonical URL tag” designed to help search engines understand a website with multiple URLs displaying the same content. Basically, all a site owner needs to do is add this tag to the head section of all versions of a duplicated page. So, for example, this tag:
would be added to the head section of all the versions of the same page shown below:
http://www.example.com/index.aspx
http://www.example.com/index.aspx?sortby=alpha
http://www.example.com/index.aspx?sid=1234567890
http://www.example.com/index.aspx?ref=joesbookstore
By adding the canonical tag to all these potential versions of the page, it tells search engines that all these URLs are essentially the same page and should be treated as such. This allows them to easily determine which page should be listed and at the same time ensure that all the linking value for these pages is preserved and combined under one URL.
The introduction of this new tag provides an alternate way for site owners to address duplicate content issues created by the way their site is designed. Up until now, the only solution that worked for all three search engines was to restrict the access of the robots to duplicate pages using instructions in the robots.txt file, robots meta tags or both. Any website owners that have been using the robots meta tag or robots.txt file to deal with this and who decide to switch to the tag will need to remove any instructions restricting access to duplicated pages from their robots.txt files and/or remove the robots meta tags so that search engines can find the new canonical URL tags.
Unfortunately, for some websites, using the robots meta tags and robots.txt file may continue to be the only viable solution to duplicate content, because although this new tag addresses the issue of which page should be indexed, it does not resolve the crawling problem associated with duplicate URLs. Since search engine robots do not realize that these pages are all the same until after they have been crawled and indexed, they may still waste valuable crawling time accessing the same content and potentially delaying the indexing of unique content. Furthermore, all three search engines have indicated that they will view the canonical URL tag as a “suggestion” and will still be using alternate means to determine which URL should be displayed in duplicate content situations. This is why the best course of action is not to give search engine duplicate URLs in the first place and using robots.txt, robots meta tags or the canonical URL tag should only be used if there is no way to program the site to be search engine friendly.
More details about this new tag can be found here:
http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/
http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html