Sometimes you may have certain pages on your website that you do not want indexed by search engines. Just recently, we developed an online order form for a client that should not show up in search engines. To ensure that a page does not get indexed or crawled by search engines, it is important to do 3 things:
1. Add a rule in your website’s robots.txt file. Assuming the page you don’t want indexed is order.htm:
User-agent: *
Disallow: /order.htm
2. Add a “noindex, nofollow” robots meta tag in the head section of the page that you don’t want indexed or crawled:
3. For each link leading to the page that you don’t want indexed or crawled add the following “rel” attribute to the anchor tag with a value of “nofollow”:
That’s pretty much it. It should be noted, that until recently, it was thought that you only had to add the rule to your robots.txt file to prevent pages from being indexed and crawled. However, Matt Cutts from Google gave some insight stating otherwise in a video posted on his blog (http://www.mattcutts.com/blog/robots-txt-remove-url/).