Duplicate content is an issue that’s common among many sites. A question that I hear frequently is, “what makes content duplicate to Google”? Google states, “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar”.
You may have heard of duplicate content before, however, many site owners are not aware of the ways in which duplicate content can occur. Typically we see duplicate content created unintentionally, but we’ve also seen it deliberately created. On sites where it is not created in a manipulative manner, there are rarely penalties by the search engines. Instead, there is something that is often referred to as a duplicate content filter. This is where the search engines filter out duplicated pages so that they can provide the searcher diverse search results.
When search engines filter out duplicate pages, you as the publisher of the content have little control over which url or domain is displayed in the search results. That being said, I think it’s important to identify a few ways that we often see duplicate content.
1) www and non-www versions both index-able by the search engines. This is probably the most common occurrence of duplicate content.
2) Inconsistent link references throughout the site.
3) Different navigation paths.
4) Different sort orders.
5) Printable versions of pages being accessible by the search engines.
6) Additional marketing domains that are not properly redirecting to the main website.
7) Different urls that are used to display various elements on the page.
8) Re-naming urls without deleting or properly implementing redirection rules.
Some ways to address duplicate content include redirecting multiple domains to the preferred or “canonical” version, using the canonical link tag, restricting access in your robots.txt file, etc. The best situation is of course a site that doesn’t create duplicate content in the first place. However, if you do have an existing site creating duplicate content, be sure that you utilize some of these handy work-arounds.