One of the most useful aspects of Google Webmaster Tools is the ability for webmasters to assess how “crawlable” their site is. In the “Diagnostics” section, one can see the reason Google is unable to crawl and index certain pages of their website. Here are some of the issues Google will report on in this section:
“In Sitemaps”: This is where Google will show which URLs are inaccessible from an XML Sitemap that you may have uploaded to Webmaster Tools under Site Configuration>>Sitemaps. Here, Google will display the URL it’s having difficulty with and the associated problem it may be having:
In the above example, the errors could have been caused because the Sitemap contains older, removed pages and/or the URL contained within the Sitemap has been manually restricted (intentionally) by the webmaster.
“Not Found”: If this section appears in the Diagnostics utility in Webmaster Tools, it could mean that Google has detected pages that issue one of the most common header responses: 404 Not Found. These errors can be tricky as they may show up because Google has found links from external websites leading to pages that you have removed from your site. It could also mean that Google has detected links on your website that are “broken” and Google will show the page where this broken link resides so you may update or remove it.
“Restricted by robots.txt”: This section displays pages on the site that have been blocked from web spider crawling via the site’s own robots.txt file: www.example.com/robots.txt. A robots.txt file is a simple text file, uploaded to the root directory that tells spiders which sections of the site to skip. This section is a useful way to see if the instructions you’ve entered into the robots.txt file are correct and functional.
“Unreachable”: Will include pages from the site that are completely inaccessible to the search engine spiders due to onsite server or network errors. These errors will usually not appear after the webmaster/IT administrator has fixed the webserver in question.