Articles in the SEO News Category

Filenames, host names and canonicalization, oh my!

July 2nd, 2008 by Jordan Sandford

It’s been a while since we’ve talked about URL canonicalization on our blog, so I’ll quickly review what it is and then talk about filename canonicalization and how it can affect your SEO endeavors.

Canonicalization is something done to your site to help ensure that content from one specific URL in your site does not show up under another URL. This is a type of a duplicate content issue. If the engines index more than one copy of some specific content, from separate, full URLs (those containing all necessary parts), they will be forced to divide the “strength” of that content between the URLs. That reduces the “strength” of all URLs involved and reduces the chance that one of your URLs will show up in a search results page for a given search term. Most of the time, the issue will arise when both www.example.com and example.com (and all other pages of the site) show the same exact contents.

Having un-canonical host names is one way that duplicate content can become a problem for your site: a search engine indexes content from your home page at www.example.com and indexes the same exact content at example.com. While on the subject of un-canonical host names, you should also know that it is possible to have un-canonical protocols in your URLs. In a complete URL like http://www.example.com/about-us.html, the “http” is known as the protocol. If search engines happen to index the same content from https://www.example.com/about-us.html, you may have just stepped into a duplicate content issue due to un-canonical protocols.

Now let’s say you have a store (or any other web-based application) on your site that is accessed from http://www.example.com/store. Generally, your ecommerce program would be located in a folder on your server called ’store’ (and usually, URLs of your store’s products are based on that URL: http://www.example.com/store/blue-widgets.html). It’s quite possible that a search engine could index content at http://www.example.com/store/index.php and at http://www.example.com/store/ and count them as duplicate content. This could happen if another site links directly to one of those URLs and your site links to the other URL. This is an example of what I call a file name canonicalization issue.

Those are basically all the canonicalization issues you could encounter. However, often times, you’ll find a site that has a combination of canonicalization issues. For example, all of the following URLs are different, but will all have the same exact content:
http://www.example.com/kitchen-sink
http://www.example.com/kitchen-sink/default.html
http://example.com/kitchen-sink/
https://example.com/kitchen-sink

Perhaps you’re wondering if there would be a duplicate content issue between the following two URLS:
http://www.example.com/blog
http://www.example.com/blog/

The answer will most likely depend on whether ‘blog’ is an actual folder on your server or a file. (Your web server or blog, or any other web-based application may be setup in a way that does not require file extensions such as .html.) Most web servers (the software on the hosting company’s computer that waits for and then responds when someone asks for a web page on your site) work the same way in this situation. If ‘blog’ is an actual folder on the web server, then whenever someone asks for http://www.example.com/blog (without the trailing forward slash), the web server automatically 301 redirects to http://www.example.com/blog/. (This is the correct redirect type for this situation, by the way, and this is the reason that this should not cause a duplicate content issue.) If someone asks for http://www.example.com/blog/ (with the trailing forward slash), the web server doesn’t redirect at all. I’ll talk more about this in my next post.

Marjory Meechan, in her blog How to Resolve the Canonicalization Issue without Access to your Server, discussed un-canonical host name issues and one way that can be used to fix this. Please read my next post in which I discuss the causes of these canonicalization issues and suggest more dependable methods to eliminate them, provided that you have some level of server access.

Posted in SEO News | No Comments » |

Google Sitelinks Explained

July 1st, 2008 by Marjory Meechan

Ever since Google introduced sitelinks, website owners and webmasters have had lots of questions about how these work. At Google’s recent June Tune Live Chat, the question arose again as to why some sites have these links and how they can control them. Here is an excerpt from the transcript of the Q and A section of the chat with Google’s answer to the question:

Q: Will we ever have control over sitelinks to bring searchers to the better pages on our site, vs the ones Google thinks are the best pages?
A: Sitelinks are automatically generated by our algorithm and are meant to help users navigate your site. While you cannot opt into having sitelinks, you are able to block sitelinks using Google Webmaster Tools.

Later on during the audible portion of the chat, in answer to this question:
Does the design of my site affect how these links are compiled? What can I do to help Google compile better sitelinks?

Bergy Berghausen replied that site architecture can affect your chances of getting sitelinks:

“Having a very simple html based navigation is the best way that we can tell how your site is organized and we try to guess based on where we think people are going on your site and how your site is structured.”

While we can’t tell you how to get the sitelinks, we can tell you how to get rid of them. If Google is linking to a page of your site that you would rather not have in the results pages, you can use the sitelink tool in Google’s Webmaster Tools to tell Google to remove the sitelink.

Posted in SEO News | No Comments » |

Duplicate Content Tips

June 30th, 2008 by Emily MacNair

There can be numerous reasons why pages may not appear in the search results or why rankings can drop. One reason is duplicate content. There are many ways in which content can be duplicated and it usually happens unintentionally. It can occur for very valid reasons, often through actions that have been taken to boost rankings. It is not the worst thing to have happen, but if it can be fixed, it is probably in your best interest to do so. I have come across this issue a few times lately and thought I would offer a few helpful tips.

In general, a search engine’s mission is to provide unique and relevant content to the searcher. When an engine comes across duplicate content, the question arises; “Which pages are the most appropriate pages to index?” To display the most useful pages in the search engine results pages (SERPS), a duplicate content filter evaluates, sorts through, and removes the duplicate content pages (and spam). The search engines may do a fairly good job determining what to index, but by taking proactive steps, it is possible to help guide them to the pages you want indexed (or at least keep them from weeding out certain pages from your site). Keep in mind that without providing any guidance, they will do it themselves which may cause disappointment.

Below are just a few ways to avoid duplicate content:

  • Resolve canonicalization issues by redirecting to the preferred domain (ex: redirecting the non-www version to the www version)
  • Submitting a sitemap with the canonical version of each URL.
  • Implement a robots.txt file to tell the search engine spiders not to crawl or index certain pages (such as printer friendly pages).
  • Have all additional domains properly redirected using a 301 redirect. This will also transfer any built authority.
  • Keep dynamic parameters in URLs to a minimum.
  • Have an internal linking strategy to build relevancy.
  • Make each page on your site unique, including unique titles, meta descriptions, headings, and navigation.

These ideas barely scratch the surface of ways to reduce duplicate content, but hopefully it will get you headed in the right direction.

Posted in SEO News, SEO & Content | No Comments » |

« Previous Entries Next Entries »