Fixing un-canonical URLs. Oh joy! Part 5

Jordan Sandford - October 16, 2008

Welcome back to my series on fixing un-canonical URLs. To date, we’ve looked at a variety of areas that could potentially cause the same content to be accessible from multiple URLs on your site, which is very problematic from a search engine perspective:

  1. Protocols (http and https)
  2. Domain and subdomain names (sometimes referred to as host names)
  3. URL paths
  4. File names
  5. Case sensitivity (when myPage.html is handled differently than MYPage.HTML)
  6. Query strings

Let’s talk about the last item in my list:

          7.    Combinations of any or all of the above

It is possible, for example, to have the same content accessible from both protocols (http and https) as well as both the www- version and the non-www- version. This scenario provides four URLs that display the same exact content:

http://www.example.com
https://www.example.com
http://example.com
– https:// example.com

Any combination of the issues (numbers 1 – 6) may be lurking on your site. In addition, one section of your site may be suffering from one combination of these issues while other sections may be suffering from another combination of issues. I usually find that the simplest way to fix a combination of issues is to first test for one issue and fix it and then move on to the next issue.

All of the six potential problem areas I discussed were caused by efforts to make the internet easier, more forgiving to use and reduce the amount of work a web visitor or web site administrators had to do. The underlying design of web servers was created before search engines like Google or Live existed and before duplicate content issues were a problem. Since web servers weren’t really designed with search engines in mind, you should keep in mind the above list as you comb your site for canonical URL issues.

Perhaps the easiest way to avoid un-canonical URLs is to build your site (or section of your site) from the ground up with these potential problem areas in mind. Granted, that may be easier said than done.

I also suggest that any new pages/files/URLs you create on your site have a file extension appropriate to the scripting language on your sever (.php, .asp, .aspx, .cfm, etc.) as opposed to .html or .htm (which are normally assumed to be file extensions of a “static” page). The reason is that if you need to redirect, for instance, from example.com/mypage.html to www.example.com/my-new-page.html and your web server limits or doesn’t support the use of tools like URL rewriting, you may have to take an SEO hit after renaming the file. This is because an html file normally cannot run scripts. (Redirecting to another page is a script function.) So essentially, creating new files on your website with “dynamic file extensions” allows much more flexibility in the future.

What’s even better is to build your site or section using a CMS that was designed to be SEO friendly.

Remember to watch out for misspellings in your URLs, that includes the path name (the part of the URL starting with the first forward slash up to, but not including the question mark or fragment) and the query string. Also keep in mind that everything in the URL except the fragment can affect canonical URL/duplicate content issues. Another point is that a phone call or email to your hosting company may be able to resolve some canonical URL issues when you can’t seem to resolve a particular issue yourself.

Also, be on the lookout for the new anything-goes top level domains (the “police” in traffic.police would be an anything-goes top level domain, for example, while edu, com and org are traditional top level domains) which could offer a few more URL canonicalization challenges in the near future.

I hope this series was helpful, time saving and useful. Best wishes to you and yours on all your URL canonicalization efforts!

© 2023 MoreVisibility. All rights reserved.