When developing a website it is critical to have unique and relevant copy to inform your potential customers about your product or service. Another great reason would be to allow the search engines to better understand your subject matter. One of the more common ways I have seen business owners displaying their content is through the Adobe’s PDF format. A PDF format gives the user the ability to download the information in a clear and structured format. While this can be a great way for your visitors to find and read the copy you develop, you could be inadvertently causing yourself harm.
For example, let’s say you have five PDF files which house a majority of your content. The search engines will most likely index the PDF files much like a normal page on your website. This means there is a possibility that when a searcher is conducting a search on a key phrase you’re targeting, the five PDF files could surface within the natural results. You’re probably asking yourself, how could it possibly be a bad thing to have my PDF files indexed and being displayed within the search result pages? While the search engines crawl through the copy of the PDF file and index the content, critical functionality such as a primary navigation is absent. Thus the PDF acts as a dead end for search engine spiders. The same can also be said about the searchers who find themselves at the PDF versus the actual website. If a searcher clicks a natural listing which happens to be one of the PDF files used to display content, they would lose the ability to navigate to other areas throughout the website. This could ultimately result in a loss of a sale as well as a diminished branding experience.
Duplicate content is a hot topic and has been for quite a while. It is also one of the most misunderstood issues in search engine optimization. Many webmasters and even some search marketers spend an extraordinary amount of time and resources trying to avoid the dreaded “Duplicate Content Penalty”, when in fact a penalty derived from duplicate content is fairly rare and reserved specifically for sites which have been observed trying to manipulate search engine rankings directly; i.e. search engine spammers.
The more common issue associated with duplicate content found by search engines is the “Duplicate Content Filter”. When a search engine finds two or more pages with identical or even nearly identical content it applies a filter which allows only one instance of the content to be returned in search results. This is in no way a penalty and does not affect the site in whole, just the specific page as it relates to the specific search query. The goal of the search engines is to provide their users with “unique” content and this filter helps to ensure each page returned in the search results is unique.
In the past couple of weeks Google has published an article with some very specific information on how it sees and handles duplicate content as well as some bullet points on issues to watch for concerning duplicate content. Additionally, another new US Patent relating to identifying and handling duplicate content has been granted to Google.Read More