Last week, all three major search engines posted on their respective Webmaster Help blogs that they were uniting in their interpretation of the robots exclusion protocol. In the past, robots.txt instructions that would work to exclude Google would not work in the same way as they would for Yahoo and MSN, so this was welcome news. All three blog posts are listed here:
Webmasters now have no reason to worry about keeping track of different methods of controlling robots, so keeping your robots.txt file simple is easier then ever. Making robots.txt simple is a good strategy for several reasons:
The more complicated the robots.txt is, the more likely there could be a mistake that would accidentally limit access to important parts of your site.
Adding listings for too many directories or files could lead to security issues with your robots.txt file or at the very least, give your competitors access to future site plans by listing all of your “secret” directories and/or files.
Robots.txt is a great tool for handling duplicate content issues related to the architecture or technology of your site, but remember that not all robots will respect the robots.txt file. If you really want to protect access to pages of your site, don’t link to them in the first place and remember that the only consistent way to make sure those pages aren’t accessible to everyone is to limit access with a login page.