Welcome back to my series on fixing canonical URL issues. Here again are the areas of canonical URL issues:
1. Protocols (http and https)
2. Domain and subdomain names (sometimes referred to as host names)
3. URL paths
4. File names
5. Case sensitivity (when myPage.html is handled differently than MYPage.HTML)
6. Query strings
7. Combinations of any or all of the above
In my last post, I discussed how protocols (http and https) can present un-canonical URLs to the search engines and how it can create duplicate content. Let’s pick up where we left off.
You have two domain names, example.com and example.biz. You want traffic to example.biz to see content at example.com. Your hosting company set up your web hosting account on their servers to be able to show visitors to www.example.com and visitors to example.com using the same files (that way you don’t have to maintain two versions of, say, the Home Page). This is the default way most hosting companies create new accounts.
To fix canonical URL issues related to different top-level domains (e.g. edu, com, org, us, etc. — look out for anythinggoes top-level domains as well), domains and/or different (or no) subdomains, you can set up your server to show content from the non-canonical domain(s) to the visitor while, at the same time, that content is banned from being indexed by the engines (using a robots.txt file or the robots meta tag) or the visitor and search engine needs to be properly redirected to the canonical domain. First, choose which subdomain/domain/top-level domain combination you want to be canonical. Set up the web servers or hosting accounts that host the non-canonical domain to ‘301 redirect’ to the canonical domain using the same rewrite rules or the ‘include’ method I previously discussed. (In a future post, I will discuss URL rewriting on Apache servers and compare it to URL rewriting on Windows servers.) Be aware that if you bought multiple domain names from a registrar, only your canonical domain may be actually hosted, while the other domains may be using their ‘forwarding’ service to redirect to your canonical domain. If you use their forwarding service or even their ‘301 redirect’ feature, they may not implement a 301 redirect consistently or properly. I am speaking from first-hand experience with well-known hosting companies.
You were categorizing your pages and realized that you accidentally placed a page in both the /blog/Colors directory and the /blog/shapes directory. This could happen from physically copying a file to another directory or perhaps you are using a blogging application (or any web application for that matter) and categorized a post in two categories. In the latter case, if the blogging application does not handle cross-posts in an SEO friendly way, you might have duplicate content issues.
As far as the URL path goes, it would be a good idea to know which URLs have duplicate content of other URLs on your site. If you don’t know, try the tools offered by Google’s Webmaster Tools. Different web applications and different types of web applications (such as blog software from vendor A vs. vendor B or CMS software from vendor C) handle canonical URL paths differently. Check what web application is powering the pages at those URLs. Also see if there are add-ons or plugins for your software that can handle duplicate content issues created from assigning multiple tags or categories to content. They could add a robots meta tag to one of the duplicate pages.
Other ways to handle this is to 301 redirect one of the duplicate pages to the other. In a pinch, and depending on the URL structure of your site, you may be able to use the robots.txt file to exclude a certain section or an exact URL of your site from the indexes, thereby removing the duplicate content while making sure that any other page in that section is not removed from the index. You want to be careful which URL you redirect from or block using the robots.txt file, because one of those URLs might be more optimized than the other.
Please remember to read my next post in this series about file names, the mysterious forward slash at the end of URLs and case sensitivity in URLs. Check back soon!