Google can be a valuable source of traffic for your website. Googlers who search for a specific keyword or keyphrase benefit from Google’s curated results. These results, separated into Search Engine Results Pages, deliver the best quality content that makes sense with the query entered. Behind the scenes, Google goes through a number of steps before displaying (or serving) the queried content to the user. These include: Crawling, Indexing, and Serving.
Crawling refers to the GoogleBot, Google’s web crawling bot (or spider), that “crawls” or discovers new and updated pages by following links from site to site. This is why the “nofollow” attribute (rel=”nofollow”) was created, to prevent GoogleBot from following a link.
Indexing refers to the process of sorting which GoogleBot conducts to organize different content types. Information processed to help GoogleBot sort a page includes tags and attributes. Some rich media files or pages with dynamic features are not able to be processed, which is why it is best to try to simplify coding on your website if you find that a page is not showing up in Google’s Index.
Serving is the end result, the displayed snippet when a Google searcher enters a query and results are “served” to the Search Engine Results Page (SERP). Google strives to serve the most relevant pages to a search query and it is a very complex process algorithm which weights results and orders accordingly.
If you are not already familiar, we urge you to read Google’s Webmaster Guidelines to learn Google’s best practice suggestions for helping find, crawl, and index your website.
As we (or most of us) know, we do not want the search engine spiders crawling around and indexing certain parts of our website. For instance, it is probably not intelligent to have spiders crawling the secure port of the server, i.e. secure parts of our websites (https://).
We also don’t want to have the Googlebot or the Bingbot crawling and indexing pages that would be a poor choice as a target for search, like PDF files, Excel documents of images. This is where the robots.txt file comes in.
The robots.txt file, uploaded to the root directory of the site (www.example.com/robots.txt), tells the spiders which directories and pages to skip. Why do we want this? If someone were to find a PDF file or a Flash file in a search result and were to click on it, those types of documents generally don’t contain links leading back to the rest of the site and can be a “dead-end” for both the the search engines and for the user. A simple “Disallow” instruction in the robots.txt file will prevent non-SEO friendly pages from possibly showing up in search results for your desired keyphrases:
Those familiar with Google products may already be aware of Google’s Labs. For those that don’t know, the Labs section of Google is where they test out new features for existing products. When checking your Gmail or Google Calendar you may have seen a tiny green beaker near the top of the page where you can see what experiments are available for the application you are using. Often the labs will give a user more functionality or better ways of using an existing product.
To those who use Google’s Webmaster tools to monitor and analyze your website, you can now use two new features in the labs section. These new features are “Fetch as Googlebot” and “Malware details”. Below is an example of what it looks like in the dashboard:
Using these new tools can help you to detect more issues with your website. The Fetch as Googlebot tool will allow you to see your website as Google sees it. You can look at the code as Google sees it to compare it to how it should be interpreted. Since it is coming from Google’s servers, it may look different than the source code. If this is the case, you may have found some issues. To run the report click “Fetch as Googlebot” and click the “success” link once it appears.
The other feature that Google is now offering is a “Malware details” function. This new functionality can help you keep a better watch over your site. If there is no Malware on your site, the tool will say so. If there is Malware detected, details will be provided. This will help you to determine if your site has been hacked and where to look to fix it. For more information on the Malware tool, head over the Google blog for more information.