How to Handle Internal Site Search Results for SEO (Block, Noindex, Canonical?)

Matt Crowley - September 12, 2023

There is a common question that we run into, both internally and from clients. That is, how should I handle internal site search result pages?

For those who might not know, the search functionality present on your website generates internal search result pages. For example, if you are planning a hiking trip and searched for “hiking backpacks” on rei.com, you will land on an internal site search result page https://www.rei.com/search?q=hiking+backpacks.

Technically speaking, each variation of this search result page is a unique URL and page to search engines. This is because searches for “backpacks” vs. “hiking backpacks” will each return unique URLs https://www.rei.com/search?q=backpacks vs. https://www.rei.com/search?q=hiking+backpacks. Therefore, both URLs are uniquely crawlable and indexable.

Why Might This be an Issue?

Given that these are just search results, they are typically not pages you want search engines like Google to spend their time crawling and/or indexing. These pages can also present potential issues including:

  • Competing against core pages: For example REI has a hiking backpacks category https://www.rei.com/c/hiking-backpacks and you wouldn’t want the internal site search result page for “hiking backpacks” to compete with the category page.
  • Infinite Number of Crawlable Pages: Since internal site search result pages aren’t manually created, technically speaking, there is no limit to the number of different results that can be generated. Any slightly unique search can generate a completely new internal site search page. This can become very problematic if you five, ten, fifteen times the number of internal site search results crawl as your main core pages. If you have a site that is 100,000 pages and search engines are crawling 1 million internal site search results, that can be problematic.

How Can Search Engines Find these Pages?

Typically, search engines will only crawl through links they find, and won’t try to input values into text boxes like a site search (although we have seen a small number of cases of search engines entering text in text boxes over the years).

So, if they don’t typically crawl through text boxes, how is this an issue? Well, that’s because we’ll often see links to internal site searches across the web, where users effectively open pathways to these pages themselves. For example, if you searched for hiking backpacks, and then wrote a blog post about your hiking adventures and linked to that search result page, now search engines have a crawlable link to get into internal site search pages.

But wait, that’s just one page, I thought you said they could find an infinite number of pages? That’s because once a search engine has reached an internal site search result page, websites often provide other pathways to reach additional variations. For example:

  1. Pagination: Results will often be spread across a large number of pages and each of these is often a unique URL as well, such as https://www.rei.com/c/hiking-backpacks?page=2.
  2. Other Sorting, Filtering, and Link Options: Though this isn’t the case with all of REI’s functionality, many sites will have all of these generate unique variations of the hiking backpacks page in our example:
  3. Related Searches: Sites often will provide a related searches module that will generate links to other similar searches. Typically, these are auto generated from real user searches, so the list can be limitless.

What Should I Do?

Ok, so hopefully we’ve convinced you this is a possible issue. The great news is that there are several possible solutions. As always, the solution that’s best for you depends on your unique circumstance. As always, we recommend having someone who is knowledgeable about technical SEO to choose the right solution after thinking critically about your situation and analyzing the current and future impacts of the situation and resolution.

While there are many theoretical and possible solutions to unique situations, we’re going to cover the most recommended solution and common cited alternatives.

If possible, block access to internal site search results using your robots.txt file. This path is typically most ideal for several reasons:

  1. It is a directive that major search engines follow, not a “hint”
  2. It is very simple and quick to implement
  3. It stops the crawling issues where search engines could be crawling a limitless number of possible pages
  4. While it doesn’t guarantee that links to internal site search results won’t be indexed, that’s typically not a problem here. The likelihood of search engines indexing links to your site search results if they are blocked using the robots.txt file is quite low in our opinion. Additionally, if they do, what’s the harm? The pages won’t be crawled so they won’t compete with other pages on your site and won’t cause a duplicate content issue. It can add noise to your data if you track indexed vs. non-indexed pages, but we believe that that is typically worth it compared to having a limitless number of pages crawlable.
  5. However, it’s important to note that if you already have many internal site search result pages crawled, indexed, ranking, and generating traffic, this may not be the best path for you and instead may require a more sophisticated approach.Implementing method is very simple, by adding one line to your robots.txt file as you can see REI has implemented it in their robots.txt file https://www.rei.com/robots.txt. Please note however that the specific format needs to match the URL structure of your internal site search result URLs.

Alternative solutions we often see cited, but typically aren’t advisable include using noindex meta tags and rel=”canonical” link elements.

  • Noindex internal site search results: You could place a noindex meta tag on all internal site search results, but we typically don’t recommend this. While it stops search engines from indexing links to internal site search results, it can still lead to these issues:
    • You’re not stopping the pages from being crawled. So, you could still end up with an issue where five, ten, fifteen times the number of pages on your site are being crawled from internal site search results compared to your core pages.
    • Anytime noindex tags are placed on pages on the website, you run the risk of those tags being carried into other pages that are not intended to have them. While this is typically not a major issue with internal site search results, it’s possible someone without knowledge of this specific tag could re-purpose code for another section of the site and carry over a noindex tag mistakenly. While that’s more common with core page templates, it is still a possibility to watch out for and prevent.
  • Using Rel=”canonical” link elements: We sometimes see people recommend or use rel=”canonical” link elements to canonicalize the search result pages back to the main search result “homepage” for example using one like this (not what REI uses):

This may be used to canonicalize any link value from the internal site search results back to the main search page. Additionally, using the robots.txt file will prevent this value from being transferred. However, this typically isn’t worth it. While people will link to your search results, it’s often not enough to provide any real link value. Additionally, you still have the crawling issue. So, is having the link value worth having the crawling issue? For most cases, we don’t believe so.

As stated, there are many other possible methods to use for this situation. In some cases they are warranted but most of the time they are not ideal or recommended. Blocking access via the robots.txt file is the most simple and effective solution most of the time.

If you have any other questions about the technical SEO health of your website, don’t hesitate to reach out to info@morevisibility.com.

© 2024 MoreVisibility. All rights reserved.