Sometimes the same page in a website can get indexed multiple times, which could potentially create a duplicate content issue and penalty. The best example of this is a website’s default or home page…
http://www.domain.com/
http://www.domain.com/index.htm
Although both of these URLs resolve to the same page, the search engines could index both of them, or possibly one and not the other. However, situations like this are not just isolated to the home page. Most websites will have default filenames in URLs contained in subdirectories like this…
http://www.domain.com/about/
http://www.domain.com/about/index.htm
… which causes the same issues as the home page. Additionally, sometimes the internal page linking structure could link to the page with or without the index.htm filename present. To prevent the default page from being accessed by its filename, we can add the following rule to our .htaccess file:
RewriteRule (.*)(index|home|default).(html|asp|aspx|htm|php)$ $1 [NC, R=301]
With this rule in place, when any page is requested with the default filename (index.htm) in the URL, the user will get 301 redirected to the default page without the filename. This will ensure that the default filename is never in the URL and that only one version of that page will get indexed by search engines.
The .htaccess file is the main configuration file for URL Rewriting software, such as Apache’s mod_rewrite and Helicon’s ISAPI_Rewrite. An .htaccess file can be used to perform many different SEO-related tasks. Whether or not your web host allows the use of the .htaccess file can mean all the difference in the world when planning an SEO strategy for your website. In all of our client projects, we use the .htaccess file to perform some SEO-critical functions. One of the most important functions that the .htaccess file can perform is domain name canonicalization.
Domain Name Canonicalization
If a domain name is not canonicalized, it means that the same site will be presented to the browser when different combinations of a domain are requested. For example, consider the following urls:
http://www.domain.com
http://domain.com
While both of the examples above look the same, they are in fact quite different. Search engines may regard them as different URLs altogether. As a result, some pages may get indexed under the www version, while others may get indexed under the non-www version. One way to ensure that search engines will only index one version is by adding the following rule into your .htaccess file:
RewriteCond %{HTTP:Host} ^domain.com$
RewriteRule (.*) http://www.domain.com/$1 [QSA]
With that rule in place, when the non-www version of the site is requested, the user will be redirected to the canonicalized www version. It should be noted that this rule will not just work for the homepage, but all pages within that domain. For example:
domain.com/page1.htm
… will redirect to this…
http://www.domain.com/page1.htm
As you can see, that rule is pretty powerful. In a future post, I will demonstrate how the .htaccess file can be used for page level redirects.