Google’s index of example.com

January
30
2009

by

While writing several blog posts and documentation, I often have used example.com to stand in for any domain name. One of the Internet standards established by the Internet engineers circa 1999 set aside example.com (as well as example.org and example.net) for documentation purposes. So if you were to click on a link to http://www.example.com in my post, you wouldn’t see an actual web page. Click on this link to see for yourself.

I’d like to demonstrate a fun little trick you can use to amaze your friends.

The page you see is when you go to http://www.example.com is completely indexable by the search engines. There’s not a lot of content, but you would think that the engines will have indexed the content exactly as your browser shows it to you. It turns out that there is a robots.txt file that blocks all spiders from all content inside www.example.com. (If you ever forget how to create a basic robots.txt file, you can use this one as a guide.) Alright, now for the punch line. Let’s see what the search engines really have indexed for http://www.example.com. Go to www.google.com and type “site:example.com” (without the quotes). What do you see? If you see only one result, click on the link: repeat the search with the omitted results included.

I see 10,400 results now. There are pages like example.com/blah/ and www.example.com/concepts. The Google search results page does not have links to the cached version for any of these results, unfortunately, so we can’t see what exactly Google has indexed from these pages, but we can go to the page ourselves. Well, I tried that, and every page I go to replies back with “Not Found.” It’s logical to conclude that those pages never existed, but also notice some of the results have been crawled by Google in the past few hours. Impossible, no?

You can try this search on other search engines too.
My feeling on this strange phenomenon is that it could either be Google’s own testing or other people testing or somehow tricking Google into adding these pages to its index. It may be relegated to certain data centers as well.

Whatever is causing this, I’m sure Google knows about it, but doesn’t feel the need to do anything about it. This phenomenon may also get you thinking about how search engines are supposed to work.

Comments are closed at this time.