Q. : For the most part., Search Engine Optimization is all about being seen. I want my content, my product, my brand to reach as many people using a search engine as possible. Because of this, it may seem counterintuitive to block sections of a website from the search engines’ indexation. Many understand the need to block admin sections, and other sensitive areas, which is also SUPER important, but miss the need to block sections that have decent content. The question arises, why would I ever block good content that I want to rank?
A. Duplication is one of the primary reasons to block a page or section from search engines. Many sites have content that is duplicated from a different section of their site, or from another domain, or even in different file types (e.g. an html page also available as a PDF), but search engines want to present users with a variety of content. A particular domain will not show numerous results for a single piece of content. If the duplication occurs across domains, the engines will chose which page is the “originator” or “authority” related to the search, and filter out the other domains that house the same content.
We therefore recommend consolidating the power of the content to a canonical page using a 301 redirect, so that only one page is available to users, or place a canonical link element on the duplicates of a page that correctly points back to one canonical page. The canonical link element suggests to the search engines a page to grant “credit” for the content, leaving all duplicates out of the indexes, but still available to users when navigating the site internally. Both of these options work within a domain and across different domains. This consolidation will give you the most “bang for you buck” so to speak for the content, but may not be possible or feasible within time constraints.
Competing against yourself across domains for content, or letting the search engines choose which page is the canonical version is definitely not recommended. If none of the previous options are possible, and the “power” of the content can not be consolidated, block all of the duplicates from indexation altogether in the robots.txt file appended to your root domain, or in the metadata of the page itself. You can block entire sections in the robots.txt file or individual pages. Blocking individual pages can create a long and confusing robots.txt, so some decide to block each page individually in that page’s header section of the source code.
When you block a page from the search engine robots, the pages will still be live and have all functionality, but the search engines will not index them, and the “power” of your content will reside in the URL that you do not block from indexation. You would be losing any link credit that the duplicate blocked page(s) may have acquired, but this loss can be offset by performing well for the page that you want visitors to enter via the search engines. This is not technically consolidation; it is more a presentation of the page you choose in search engine results, eradicating the ambiguity to the search engines of which URL to present. In other words, you may lose some “credit” that blocked pages have accumulated, but at least you can clearly indicate to the search engines the specific page you intend to rank.