The term “sandbox” was coined by webmasters to represent the time that a new website must wait before it is listed for a competitive keyword in Google. Much like how children first play in the safety of a small sandbox, Google also forces new websites to do their time before joining the older kids on the rest of the playground. The sandbox process is difficult to explain, since Google claims it does not officially exist. But tests by webmasters have confirmed its existence and effect on newly created websites.
The first thing that happens to any new website in Google is what some call the “fresh boost”. This is when the website is allowed to rank freely among the other sites often on the first three pages of the search results. This fresh boost usually lasts for about a month or two and is monitored by Google to see how well the site performs and how much it grows in terms of content and backlinks.
If the site passes Google’s fresh boost test it is allowed to remain in the rankings. The problem is that 99% of sites fail this test and are sent into the sandbox for a period of time that can last for nine months or more. No one really knows what needs to be done in order to pass Google’s test, but there are many ideas as to what Google is looking for. These often include authority back links from established and trusted sites such as DMOZ or Wikipedia. Basically, the idea is that if the bigger kids allow you to play with them, you get to stay. If you can’t manage to gain the trust of Google and authority sites in the allotted time, you are sent into the sandbox as an un-trusted or spam site.
Once in the sandbox there is no proven way out. Many say they have gotten out by a mass flood of links, but building such a massive amount of links can get a site banned from Google altogether. Many webmasters would rather wait and do their time than get banned, since it is extremely hard to get a domain un-banned from Google. The best thing you can do is continue to go about building your site and ignore the fact that you’re even in there. Use the time to add content to your site and continue to build back links from other websites. Once your time is up, you will have proven to Google that your site can be trusted and will be allowed to rank for highly searched keywords once again.
While in the sandbox you will still be indexed and listed in Google for non-competitive keywords and low-search volume terms. The sandbox only affects certain keywords and certain pages within your site, so you will still receive traffic from Google just not as much as you will in a year’s time. If you’re trapped in the sandbox, don’t worry. You will get out some day, and while you’re waiting for Google to trust you remember there is always Yahoo! and MSN.
The latest old search engine news is that Google has begun including sub-domains in their “host crowding” filter which restricts search results to two results from each domain. Matt Cutts reported on his blog that, despite early reports that this was a change in progress, Google has been filtering out duplicate sub-domains from some search results for several weeks. In particular, searches returning fewer results — the long-tail results — now contain a maximum of two results per domain and that includes sub-domains. The days when one site could rule an entire page of search results are numbered.
This news should not come as a big surprise to any of Morevisibility’s clients as we have been warning of the dangers of duplicate content across domains and sub-domains for some time. However, what many may not realize is that it is not only Google that is diligently weeding out duplicate content from its results. Comparable long tail searches on both Google and Yahoo reveal that they are both doing a pretty good job of excluding duplicate results from the same domain and/or sub-domain. Live search results are also being filtered (although the option to turn off the filter is a nice touch — based on an informal test, un-checking the “Group results from the same site” option in the Options page appears to turn it off):
Duplication of pages in search results is a problem for everyone — both searchers and site owners and it’s good to see that the search engines are making headway on this duplicate sub-domain issue so that more sites can appear in search engine results and a wider variety of results are available for searchers.
I have been throwing around the term “search engine index” without really taking the time to think about what exactly that term means and how it relates to websites. I’m pretty sure I’m not the only one and that’s a bit a shame. Once you start to think about what a search engine index encompasses you can really begin to appreciate the magnitude of work that search engines go through to try to give users the most relevant results to their queries.
When someone performs a query, the search engine does not in real time work its way through the web trying to find pages that are relevant. It has an index that it refers to in order to display a list of URLs that are related to the search query terms used. Essentially a search engine index is a database that holds a list of all the words on all the pages that the search engine has been able to find. From there a list is created of which pages those search words are found in and this data makes up the search engine index.
To generate the index, search engines use a tool referred to as spiders, also known as crawlers, to take inventory of every word found on web pages. The way spiders find your web server is by following a link to your website. It will then request that URL from your server and begin to put together a list of words found on that particular URL and place it in the search engine index. It has been reported that Google for example has over 24 billion pages indexed. I don’t know about you but I think it’s amazing that a search engine has that much reach and indexing capabilities.
In the end, the main goal of a search engine index is to optimize the speed and the methodology of finding the most relevant documents for its users. And that’s a pretty cool thing.