A common misconception within the online community is that there are “penalties” for having duplicate content on your website. Many webmasters tend to get very antsy if they think the folks at Google et al are going to put them in “search engine jail” for having duplicates of this and duplicates of that on their website. In actuality, omission or de-ranking is reserved for only the most blatant offenders.
If your intention is to deliberately steal content from another website or spam your page’s content with keywords with the goal of ranking higher, then you should probably fear Google’s wrath. However, if some pages simply look very similar or are just duplicated because of a stubborn CMS, the worst that will happen is that one of these pages will simply be filtered out and demoted to the supplemental index. The best way around this is to either employ the proper redirects (a topic I discussed in my last blog post) or to make all pages on the site as distinct as possible.
Barring the iron fist of the search engines, it is still good practice to avoid duplicate content for the sake of your users. The more unique content on a website, the wider the reach you will have in the search engines and the better experience you will provide for your users.
There can be numerous reasons why pages may not appear in the search results or why rankings can drop. One reason is duplicate content. There are many ways in which content can be duplicated and it usually happens unintentionally. It can occur for very valid reasons, often through actions that have been taken to boost rankings. It is not the worst thing to have happen, but if it can be fixed, it is probably in your best interest to do so. I have come across this issue a few times lately and thought I would offer a few helpful tips.
In general, a search engine’s mission is to provide unique and relevant content to the searcher. When an engine comes across duplicate content, the question arises; “Which pages are the most appropriate pages to index?” To display the most useful pages in the search engine results pages (SERPS), a duplicate content filter evaluates, sorts through, and removes the duplicate content pages (and spam). The search engines may do a fairly good job determining what to index, but by taking proactive steps, it is possible to help guide them to the pages you want indexed (or at least keep them from weeding out certain pages from your site). Keep in mind that without providing any guidance, they will do it themselves which may cause disappointment.
Below are just a few ways to avoid duplicate content:
These ideas barely scratch the surface of ways to reduce duplicate content, but hopefully it will get you headed in the right direction.
Having duplicate content on your site may not seem like it could cause a problem in search engine indexes. After all, the more keyword relevant pages that a site has in indexes, the more likely that a page from the site will appear in the search engine results pages (SERPS) for that keyword, right? It’s true that duplicate content in search engine indexes is not the worst problem that a site can have — it’s infinitely better than no content, for example. However, serving up duplicate content to search engines can cause problems. This is because although the major search engines are dedicated to crawling the entire web and indexing every single page, they also are constantly striving to present as many unique and relevant results to their users as possible. To do this, they have to filter out duplicate content particularly when it occurs on the same site.
How do duplicate content filters work? Every search engine is different and this is an aspect of search engines that is changing all the time. In fact, Google recently made major changes to the way they handle duplicate content. Prior to fall of 2007, they maintained two indexes: a main index where most search results pages were called from and a supplementary index. Pages in the supplementary index were much less likely to appear in the SERPS. Google has now eliminated the distinction between indexes and started using other methods to ensure that pages from a single site do not overwhelm the search engine results. Some of these include:
So, if Google is taking care of this issue, why should we care? There are two main reasons:
The bottom line is that a well designed site that takes care to serve only one version of a page to both search engines and visitors will be crawled more efficiently and will be less confusing for visitors to navigate. Furthermore, as the site owner, you will choose which pages will be displayed and not some anonymous algorithm. Google has provided some tips on how to streamline your site and avoid duplicate content issues. How important this is to your site can depend on many factors, but taking any advantage you can when competing for those all-important first page positions is just good sense.