Duplicate Content in Search Engine Indexes: Too much of a good thing?

Marjory Meechan - March 7, 2008

Having duplicate content on your site may not seem like it could cause a problem in search engine indexes. After all, the more keyword relevant pages that a site has in indexes, the more likely that a page from the site will appear in the search engine results pages (SERPS) for that keyword, right? It’s true that duplicate content in search engine indexes is not the worst problem that a site can have — it’s infinitely better than no content, for example. However, serving up duplicate content to search engines can cause problems. This is because although the major search engines are dedicated to crawling the entire web and indexing every single page, they also are constantly striving to present as many unique and relevant results to their users as possible. To do this, they have to filter out duplicate content particularly when it occurs on the same site.

How do duplicate content filters work? Every search engine is different and this is an aspect of search engines that is changing all the time. In fact, Google recently made major changes to the way they handle duplicate content. Prior to fall of 2007, they maintained two indexes: a main index where most search results pages were called from and a supplementary index. Pages in the supplementary index were much less likely to appear in the SERPS. Google has now eliminated the distinction between indexes and started using other methods to ensure that pages from a single site do not overwhelm the search engine results. Some of these include:

  • Grouping duplicate URLs into a “cluster” and consolidating their properties including inbound links in one URL which is then displayed in the SERPS.
  • Only displaying a maximum of two results from any one domain (including sub-domains) in the results pages and providing a link to display more results if the searcher wishes.

So, if Google is taking care of this issue, why should we care? There are two main reasons:

  1. Search engines do not crawl all the pages of a site on every visit. How often a site is subject to a “deep crawl” depends on how important the search engines view the site, but even very important sites are not fully crawled every time. How many pages are crawled can be dependent on how much time the search engines have allocated to crawling your site. If they are wasting time collecting the same content over and over rather than crawling and indexing the unique pages, some of your content may not be included in search engine results as quickly as you would like.
  2. When Google chooses which URL to display, they may not be considering issues like which page has the best title or meta tags or URL filename. If you have gone to the trouble of optimizing a specific page for search engines, your work is all for naught if they choose to display a non-optimized page in the SERPs instead.

The bottom line is that a well designed site that takes care to serve only one version of a page to both search engines and visitors will be crawled more efficiently and will be less confusing for visitors to navigate. Furthermore, as the site owner, you will choose which pages will be displayed and not some anonymous algorithm. Google has provided some tips on how to streamline your site and avoid duplicate content issues. How important this is to your site can depend on many factors, but taking any advantage you can when competing for those all-important first page positions is just good sense.

© 2023 MoreVisibility. All rights reserved.