On Wikipedia and any number of message boards and forums across the internet, it is common to find external links marked with a “nofollow” attribute like this one here:Marjory Meechan
The “nofollow” attribute for links was adopted a few years ago by major search engines to combat the use of spam links that were showing up in beleaguered forum pages all over the internet. In an attempt to build ranking for pages on their sites, spammers would insert link references to those pages using their chosen keywords as anchor text in the comments section of message boards, forums and blog posts. In many cases, the comments were completely irrelevant to the content of the discussion and were a big nuisance for these sites and the search engines.
To help discourage this practice, forum owners were encouraged to place the “no follow” attribute on their links and all the major search engines got together and announced that they would not credit these inbound links to sites for the purposes of calculating search engine results ranks. This turned out to be an excellent solution to the problem and is now the standard for blog comments and message board comments.
One question that often arises with respect to the use of this attribute is whether or not the page that the link containing the “nofollow” attribute leads to will be included in the search engine’s index. In fact, whether or not a spiderable page is included in any search engine index is, for the most part, entirely up to the search engine. However, major search engines differ on how they are interpreting the “nofollow” attribute. Google reads it as an instruction not to follow the link or index the page it leads to but other search engines do not. And even Google will index the page if it is linked to from another source. This is why it is not a good idea to depend on this attribute for restricting robot access to the pages of a site.
For clarification, here are the best ways to restrict content from the major search engines:
To remove an entire directory of pages from robot access, place this line in the robots.txt file:
To remove all files that begin with the same string of characters, place this line in the robots.txt file:
To restrict access to one page, place this meta tag in the head section of the page:
For additional information on standards for robots exclusion, visit www.robotstxt.org. Here are some other resources on Google and Yahoo support for robot exclusion standards.
Additional Resources and Information:
Restricting content from Google’s index:
How Google interprets the nofollow attribute:
Restricting content from Yahoo’s index:
How Yahoo interprets the nofollow attribute:
MSN Guidelines for robot exclusion:
https://account.live.com/HelpCentral.aspx – choose “Live Search Site Owner” from the drop-down menu and choose the option: “How do I control which pages of my website are indexed?”.
How MSN interprets the nofollow attribute — this is not explicitly mentioned in MSN HelpCentral but this was their original announcement on the topic: