The Canonical URL Tag: A New Way to Resolve the Duplicate Content Issue

Marjory Meechan - February 20, 2009

Last week, in a rare unified move, all three major search engines announced support for a new “canonical URL tag” designed to help search engines understand a website with multiple URLs displaying the same content. Basically, all a site owner needs to do is add this tag to the head section of all versions of a duplicated page. So, for example, this tag:

canonicalwould be added to the head section of all the versions of the same page shown below:
http://www.example.com/index.aspx
http://www.example.com/index.aspx?sortby=alpha
http://www.example.com/index.aspx?sid=1234567890
http://www.example.com/index.aspx?ref=joesbookstore

 

By adding the canonical tag to all these potential versions of the page, it tells search engines that all these URLs are essentially the same page and should be treated as such. This allows them to easily determine which page should be listed and at the same time ensure that all the linking value for these pages is preserved and combined under one URL.

The introduction of this new tag provides an alternate way for site owners to address duplicate content issues created by the way their site is designed. Up until now, the only solution that worked for all three search engines was to restrict the access of the robots to duplicate pages using instructions in the robots.txt file, robots meta tags or both. Any website owners that have been using the robots meta tag or robots.txt file to deal with this and who decide to switch to the tag will need to remove any instructions restricting access to duplicated pages from their robots.txt files and/or remove the robots meta tags so that search engines can find the new canonical URL tags.

Unfortunately, for some websites, using the robots meta tags and robots.txt file may continue to be the only viable solution to duplicate content, because although this new tag addresses the issue of which page should be indexed, it does not resolve the crawling problem associated with duplicate URLs. Since search engine robots do not realize that these pages are all the same until after they have been crawled and indexed, they may still waste valuable crawling time accessing the same content and potentially delaying the indexing of unique content. Furthermore, all three search engines have indicated that they will view the canonical URL tag as a “suggestion” and will still be using alternate means to determine which URL should be displayed in duplicate content situations. This is why the best course of action is not to give search engine duplicate URLs in the first place and using robots.txt, robots meta tags or the canonical URL tag should only be used if there is no way to program the site to be search engine friendly.

More details about this new tag can be found here:
http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/
http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

© 2023 MoreVisibility. All rights reserved.