As part of their Pop Picks program for answering webmaster questions, Google has provided webmasters with some clarification on the popular SEO myth that Google cannot crawl dynamic URLs. In fact, they have even gone so far as to suggest that dynamic URLs should not be rewritten at all because Google “can use the information provided through the parameters to better understand what your site is doing with those parameters” and “perhaps let us attempt other keywords that might lead us to content that we haven’t seen for your site”. While appreciating and even supporting Google’s efforts to crawl the “deep web”, there is some concern that their advice runs counter to what we have heard from Google in the past making this new clarification somewhat murky.
Of course, anyone who has looked at search engine results any time over the last two years is aware that search engines are capable of crawling and indexing pages with many more than three parameters. These two listings from the USDA Agricultural Marketing Service site illustrate this:
The full URLs are shortened in the search engine results page. These are the full URLs for these listings with each “=” sign indicating a new parameter:
http://www.ams.usda.gov/AMSv1.0/ams.fetchTemplateData.do?template=TemplateN&navID=ANSIReportNOPCertifiers&rightNav1=ANSIReportNOPCertifiers&topNav=&leftNav=NationalOrganicProgram&page=NOPANSI&resultType=&acct=nopgeninfo
Furthermore, Google has a point about the difficulties of implementing URL re-writing on dynamic websites. It is not uncommon for automatic URL rewriting modules for these sites to create more issues than they resolve. This is because they may have been improperly implemented and not because there is anything inherently wrong with presenting keyword-rich user-friendly URLs to visitors and search engines. Some examples of problems that can occur are:
1. The links in the navigation contain URLs with parameters like this one: www.example.com/id=4&id=5, which then redirect to SEO friendly URLs like www.example.com/stuff. The problem with this is not the format of the URL. It’s that there is a redirect built into the navigation of the site. To resolve this issue, ensure that any URL re-writing program does not involve on-site redirects. The link in main navigation should lead directly to www.example.com/stuff – no redirects.
2. The links in the main navigation lead directly to SEO friendly static URLs like www.example.com/stuff, but the dynamic URLs are still accessible to search engines. If anyone links to the dynamic URL, then both are available to search engines and both contain the same content. This makes it difficult for search engines to figure out which of these pages they are supposed to index. As noted above, in Google’s case, they filter out duplicate content and if they filter out the wrong page, it could have negative effects on your search engine rankings.
3. Finally, even sites that have links leading directly to the main navigation with accompanying redirects for the dynamic pages may run afoul of search engines if the redirects do not cause the server to issue the correct header response. All redirects designed to implement SEO friendly URLs should issue a header response of 301 (Permanent). This tells search engines that the true URL is the SEO-friendly one and not the dynamic one — if they should happen across it. Many URL rewriting programs use redirects that issue a header response of 302 (Temporary). This causes both URLs to be listed which does not solve the issue at all.
Although Google claims that they can filter out the duplicate content that multiple parameters might create on your site (and in truth, they are doing much better at this today than even a year ago), too many pages targeting similar content as a result of the dynamic nature of the pages can create issues. Small product differences can cause ten or more pages that basically feature the same content to be generated. In those cases, only one of those pages is indexed and without any restrictions on search engines, which page is listed is totally up to Google.
In addition, these dynamic URLs do not contain keywords relevant to the page, they are very long and non-user friendly and when parameters are not handled correctly, they can result in the same content on the site being presented under more than one URL filename — the infamous duplicate content issue.
One common example of this as it pertains to dynamic URLs is a page that receives a different URL depending on how and where the page is generated. For example, these two links to the 2006 Australian Census Data are displaying the same basic content:
http://www.censusdata.abs.gov.au/ABSNavigation/prenav/PopularAreas?collection=census&period=2006
This is not a big problem because Google will filter out the duplicate content. However, for other search engines, this could cause more difficulties and even for Google, it is not clear which of these pages they would choose to index. If a site owner has a clear preference as to which page should be indexed, allowing Google to make the choice may not be preferable. Furthermore, even though Google’s new information suggests that they can handle the duplication of pages with different URLs, webmasters who have received Google’s duplicate content warnings in their Webmaster Tools Account remain skeptical.
Finally, even Google has limits on how many pages of a site it can reasonably crawl in one session. Naturally, a site owner would prefer Google to come in, find any new or updated pages and leave rather than crawling the site indefinitely in search of every possible permutation of the content available. If keeping the number of parameters to a minimum in dynamic URLs will help that happen, then it is in the best interest of the site owner and Google for the site to be constructed with that end.
While we continue to recommend, purely from a user experience perspective, that site owners take control over what content they present to search engines and visitors by creating short keyword rich URLs that limit the number of parameters, we can appreciate the difficulties that improper URL rewriting have created for search engines. Perhaps a better solution from Google would be more of a partnership with site owners allowing them more control over how a site is indexed with respect to site parameters. This is the step that has been taken by Yahoo. Yahoo’s Site Explorer will allow you to specify which parameters can be ignored for the purposes of determining what is unique content. Even so, it’s great to see that Google is so responsive to the needs of its content providers and continues to be forthcoming with innovations to their crawling and indexing technology.