Last week, all three major search engines posted on their respective Webmaster Help blogs that they were uniting in their interpretation of the robots exclusion protocol. In the past, robots.txt instructions that would work to exclude Google would not work in the same way as they would for Yahoo and MSN, so this was welcome news. All three blog posts are listed here:
Webmasters now have no reason to worry about keeping track of different methods of controlling robots, so keeping your robots.txt file simple is easier then ever. Making robots.txt simple is a good strategy for several reasons:
The more complicated the robots.txt is, the more likely there could be a mistake that would accidentally limit access to important parts of your site.
Adding listings for too many directories or files could lead to security issues with your robots.txt file or at the very least, give your competitors access to future site plans by listing all of your “secret” directories and/or files.
Robots.txt is a great tool for handling duplicate content issues related to the architecture or technology of your site, but remember that not all robots will respect the robots.txt file. If you really want to protect access to pages of your site, don’t link to them in the first place and remember that the only consistent way to make sure those pages aren’t accessible to everyone is to limit access with a login page.
On Wikipedia and any number of message boards and forums across the internet, it is common to find external links marked with a “nofollow” attribute like this one here:Marjory Meechan
The “nofollow” attribute for links was adopted a few years ago by major search engines to combat the use of spam links that were showing up in beleaguered forum pages all over the internet. In an attempt to build ranking for pages on their sites, spammers would insert link references to those pages using their chosen keywords as anchor text in the comments section of message boards, forums and blog posts. In many cases, the comments were completely irrelevant to the content of the discussion and were a big nuisance for these sites and the search engines.
To help discourage this practice, forum owners were encouraged to place the “no follow” attribute on their links and all the major search engines got together and announced that they would not credit these inbound links to sites for the purposes of calculating search engine results ranks. This turned out to be an excellent solution to the problem and is now the standard for blog comments and message board comments.