How Can I Use Google Webmaster Tools for XML Sitemap Optimization?

February
15
2011

by

Another useful instrument in Google Webmaster Tools is the Sitemap section in Site configuration>>Sitemaps:

Google Webmaster Tools XML Sitemaps Section  
          Figure 1 – Google Webmaster Tools “Sitemaps” Section

You can check your submitted XML Sitemaps in this section and identify which pages from them Google has managed to index. You can also tell if Google had trouble accessing the Sitemap by seeing if there is a checkmark in the “Status” field. That being said, why do we even want to do this? If a site is already “crawlable” in Google’s eyes, why do we need to perform this extra step?

A Sitemap is a true representation of a website’s structure in that it allows Google to access every page that you wish to have crawled, indexed and potentially be a target for search. It also helps augment Google’s crawling and indexing by allowing Google to crawl pages that they would not be able to access otherwise, such as JavaScript enabled links or links using Flash. Once all of the pages from your site that you want Google to know about are included in the Sitemap (50,000 maximum URLs in one Sitemap and 10MB uncompressed file size limit), upload it to the Sitemap section in Google Webmaster Tools by clicking on the “Submit a Sitemap” button:

Google Webmaster Tools  
              Figure 2 — “Submit a Sitemap” Button

Clicking on the above button reveals a field to enter the physical location of the XML Sitemap, which is usually in your website’s root directory, for instance: www.example.com/sitemap.xml.

Keep in mind that the specific numbers reported in the Sitemaps section of Webmaster Tools only apply to the URLs you submitted in your Sitemap(s), not the amount of pages you actually have in the index; there will always be a discrepancy between the “URLs submitted” and “URLs in web index”:

Google Webmaster Tools  
            Figure 3 – URLs in Web Index

In fact, it is rare that the number of URLs reported in both sections will be the same.   There could be discrepancies because of restrictions on a lot of files in your robots.txt or just duplicate pages that Google has decided not to index.
Just ensure that URLs in your Sitemap are the “canonical” URLs (www or non-www for example). If there are URLs you care about that aren’t in your Sitemap, just add them in and re-submit. Many times, web developers will add multiple pages to a site and forget to update their Sitemaps. This can be problematic if the new pages are not well interlinked on the site. Remember, your site can have the most optimized pages ever created, but all of your hard work will be in vain if Google doesn’t know about them!

 

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.