A few weeks ago we noticed that one of our clients had a single web page showing up multiple times in Google. In the SEO world, this is known as duplicate content and is generally frowned upon. The URLs for the two entries in question looked something like this:
As you can see, the difference in case is what designates this as being two separate pages to the search engines, but they do in fact point to the same physical page. In the Linux world, this would not matter so much, because URLs are case sensitive, so one of the above URLs would throw a 404 or Page Not Found error. So in effect, only one page would get indexed. However, in the Windows world, URLs are not case sensitive, so both URLs would serve up the same webpage. Upon further examination of the internal linking structure, we noticed that the same case (lower) was being used to reference all of the internal URLs. The problem came from the outside world pointing to the same page in various case combinations. This meant that people who were linking to the website from the outside world could have potentially given the website duplicate content penalties from search engines, which is not good.
The solution to this problem is quite simple and elegant. We added one rewrite rule to the website’s .htaccess file, which permanently redirected (301) any URL containing upper case characters to its all lower case equivalent. Since you cannot predict, or enforce how people link to your website, we strongly suggest you utilize this simple solution to prevent this duplicate content penalty.