Welcome back to my series on fixing canonical URL issues. In my last post, Fixing un-canonical URLs. Oh joy! Part 3, I discussed how case-insensitivity and having a default index file could negatively affect your URL canonicalization efforts. Today, we’ll talk about query strings and how they can affect your canonicalization efforts. A query string is a grouping of parameters and values at the end of your URLs that looks like “?podID=249&catID=31”. Let’s review the areas where your site can have un-canonical URLs:
Let’s say you have a web page that is used as a landing page and you want to track which of your affiliates referred some visitors to your site. You give each of your affiliates the URL to that page, but add a customized query string to the URL that contains the name of that particular affiliate. The landing page, like many web pages, will show the same generic content regardless of the query string in its URL, but may have another section of the page that does change (even if it’s unnoticeable by a visitor or a search engine) depending on the query string. Your landing page tallies (in a database) the number of visits from each affiliate to the landing page. Though the treatment of URLs with query strings may vary from search engine to search engine, they have the ability to cause different content to show while using the same domain and URL path. You should consider query strings in your efforts to rid your site of duplicate content.
In addition to tracking affiliates and referrer sources, query strings are also used by web applications to display dynamic content from different sections of a site using the same page template or layout. In fact, this is the most common use of query strings in URLs. With this scenario, you can see that query strings allow the web page creator to show a variety of content dynamically while maintaining just one file instead of maintaining one static web page file for each set of content he or she wants to show. A good example of this is an ecommerce site.
If you happen to have a web page that shows one set of contents using one URL plus query string and the same set of contents using the same URL but a different query string, you may have an un-canonical URL issue.
To fix this, you probably don’t want to blindly 301-redirect all URLs that have query strings to the same, respective URLs with the query string chopped off for the simple reason that someone or something probably intended for those query strings to do something meaningful.
One pretty rare exception can be encountered when browsing a site that must remember something about you during your time browsing session. Usually, cookies are used to keep track of which visitor has which products in their shopping cart by storing your unique visitor session number. If the visitor’s web browser is set up to not accept cookies or the ecommerce application is not setup to use cookies, the session number may be appended to every URL the visitor uses to browse the store. That URL may look like: http://www.eample.com/store/privacy_policy.aspx?SESSID=233493JJG37272HB. If cookies were being used, the exact same content would show for any page you visit on that site, but the URL would simply not have the “?SESSID=233493JJG37272HB” part at the end.
Because, under normal circumstances, you want to use the query string appropriately in your web pages, one safe method to canonicalize URLs with query strings (that show the same content) is to add a robots meta tag that instructs the search engines to not index the page. This will work as long the same URL without the query string does not have the robots meta tag. Many times, however, there is a low chance that the URL with a query string is even known to the search engines because you would not publish that URL anywhere.
Another way to canonicalize URLs with query strings is to create SEO-friendly versions of these URLs. Creating SEO-friendly URLs would normally take a URL like http://www.eample.com/store/podID=249&catID=31 and turn it into something like http://www.eample.com/store/cellphones/nokia-8851-clam-shell-camera.html. It doesn’t make sense to create SEO-friendly versions of URLs such as http://www.eample.com/?affiliateID=1446545 because changing the affiliate ID in the query string would probably not change the content of the page substantially. There are various ways to create SEO-friendly URLs. Your CMS may already offer this. If you’re not using a CMS that supports this, you may have to resort to creating URL rewrite rules for your website. (The first part of this series has several links to resources about URL rewriting for both Microsoft IIS and Apache web servers.)
As the topic of query strings is the last item in the 7-item list above, my next post will wrap up this series and provide some tips in your efforts to rid your site of un-canonical URLs.