Articles written in August, 2008

How Cool is Cuil and What of Wikia Search?

August 20th, 2008 by Michelle Stone

Now that some time has passed since the launch of Cúil (www.cuil.com), one of two new search engines lauded as “Google killers”, let’s take a look at how Cúil has (or hasn’t) improved.

One of the initial quirks noticed when searching within Cúil, aside from the sporadic uptime due to the high-traffic interest, was that the image results didn’t always match with the content on Cúil’s results pages.  As we first noted in our blog on the day of launch, a query on “tree frogs” yielded interesting findings on the Cúil search engine results page (SERP).
cuil-george-bush-tree-frog 
Figure 1: Cúil search engine results page (SERP) from 29 July 2008

At the time, it did appear that the Cúil engine was in the process of learning – bettering its results as more and more people used it.  So what does the same query yield today?
cuil-tree-frog-serp 
Figure 2: Cúil search engine results page (SERP) from 19 August 2008

Comparing the two results pages, it’s easy to see that the image matching has greatly improved on Cúil.

How has the other of the two new search engines fared?  Much as Cúil gained early notice by virtue of their back story (the search engine was developed by ex-Google staffer Anna Patterson — who developed the TeraGoogle indexing system that Google still uses today — and her husband Tom Costello, who developed search engines at Stanford and IBM) the other contender, Wikia Search (re.search.wikia.com), also boasts an impressive pedigree.

Wikia Search is the brainchild of Wikipedia founder Jimmy Wales.  The “human-powered” search engine debuted officially in January and purported to be an open source Internet search engine.  Using our earlier example of “tree frog”, let’s see what the results are in Wikia Search.
wikia-search-tree-frog 
Figure 3: Wikia Search SERP for “tree frog” query

One of the key components of Wikia Search was the fact that the engine encourages users to contribute to the search results, effectively making Wikia Search a form of a social networking search engine.  The “community” can build upon the search results through the use of an Add to this result feature which appears on the SERP itself (please see below).

wikia-search-add-result-to
Figure 4: Wikia Search “Add to the result” field highlighted

This week, Wikia Search has publicly demonstrated that it is moving forward with improving its results by updating its Grub web crawler tool (www.grub.org) and by encouraging users to become a part of the process by ranking websites and by downloading Grub.  Also, earlier this month, Wikia Search launched an official version of the Wikia (www.wikia.com) toolbar.  This toolbar is available for download and can be added onto the Mozilla Firefox web browser.

Through all of the various ways in which Wikia Search can improve its results through community participation, a question arises – how can Wikia Search compete with Google in terms of perceived usefulness and relevance with the results?  Scrolling down the Wiki Search SERP for “tree frogs” shows an unusual result.
wikia-search-tree-frog-serp 
Figure 5: Wikia Search result for “tree frog” query highlighted

Mixed into the various “tree frog” related websites is an entry for a writers’ reference site.  What relevance does this have to tree frogs?  It’s difficult to say off-hand.  What is apparent is how out of place this result seems to be for the “tree frog” SERP.

While Cúil and Wikia Search are making progress in improving their search results, they both still have quite a ways to go in order to become the “Google killers” they were reported to be.  According to data reports from Hitwise (www.hitwise.com), the Internet monitoring company which measures market share, last month the Google search engine accounted for just over 70 percent of all online search engine queries.  Based on that number, it’s plain to see that the two newest players have a long climb to the top.

Posted in Google | No Comments » |

Fixing un-canonical URLs. Oh joy! Part 2

August 19th, 2008 by Jordan Sandford

Welcome back to my series on fixing canonical URL issues. Here again are the areas of canonical URL issues:

      1.Protocols (http and https)
      2.Domain and subdomain names (sometimes referred to as host names)
      3.URL paths
      4.File names
      5.Case sensitivity (when myPage.html is handled differently than MYPage.HTML)
      6.Query strings
      7.Combinations of any or all of the above

In my last post, I discussed how protocols (http and https) can present un-canonical URLs to the search engines and how it can create duplicate content. Let’s pick up where we left off.

You have two domain names, example.com and example.biz. You want traffic to example.biz to see content at example.com. Your hosting company set up your web hosting account on their servers to be able to show visitors to www.example.com and visitors to example.com using the same files (that way you don’t have to maintain two versions of, say, the Home Page). This is the default way most hosting companies create new accounts.

To fix canonical URL issues related to different top-level domains (e.g. edu, com, org, us, etc. — look out for anythinggoes top-level domains as well), domains and/or different (or no) subdomains, you can set up your server to show content from the non-canonical domain(s) to the visitor while, at the same time, that content is banned from being indexed by the engines (using a robots.txt file or the robots meta tag) or the visitor and search engine needs to be properly redirected to the canonical domain. First choose which subdomain/domain/top-level domain combination you want to be canonical. Set up the web servers or hosting accounts that host the non-canonical domain to ‘301 redirect’ to the canonical domain using the same rewrite rules or the ‘include’ method I previously discussed. (In a future post, I will discuss URL rewriting on Apache servers and compare it to URL rewriting on Windows servers.) Be aware that if you bought multiple domain names from a registrar, only your canonical domain may be actually hosted, while the other domains may be using their ‘forwarding’ service to redirect to your canonical domain. If you use their forwarding service or even their ‘301 redirect’ feature, they may not implement a 301 redirect consistently or properly. I am speaking from first-hand experience with well-known hosting companies.

You were categorizing your pages and realized that you accidentally placed a page in both the /blog/Colors directory and the /blog/shapes directory. This could happen from physically copying a file to another directory or perhaps you are using a blogging application (or any web application for that matter) and categorized a post in two categories. In the latter case, if the blogging application does not handle cross-posts in an SEO friendly way, you might have duplicate content issues.

As far as the URL path goes, it would be a good idea to know which URLs have duplicate content of other URLs on your site. If you don’t know, try the tools offered by Google’s Webmaster Tools. Different web applications and different types of web applications (such as blog software from vendor A vs. vendor B or CMS software from vendor C) handle canonical URL paths differently. Check what web application is powering the pages at those URLs. Also see if there are add-ons or plugins for your software that can handle duplicate content issues created from assigning multiple tags or categories to content. They could add a robots meta tag to one of the duplicate pages.

Other ways to handle this is to 301 redirect one of the duplicate pages to the other. In a pinch, and depending on the URL structure of your site, you may be able to use the robots.txt file to exclude a certain section or an exact URL of your site from the indexes, thereby removing the duplicate content while making sure that any other page in that section is not removed from the index. You want to be careful which URL you redirect from or block using the robots.txt file, because one of those URLs might be more optimized than the other.

Please remember to read my next post in this series about file names, the mysterious forward slash at the end of URLs and case sensitivity in URLs. Check back soon!

Posted in SEO & Technology | No Comments » |

New Google Features

August 18th, 2008 by Michael Buczek

While doing research for a client, I stumbled upon some interesting things that Google is doing.  While they are not big changes, I thought it would be good to show them off and give my 2 cents!

Ever notice how some websites have a splash or entrance page?  Do they annoy you?  They do me.  It seems to me that this is an extra click which is really unnecessary.  It looks as though Google has recognized this dilemma and is now giving us the choice of viewing the entrance page, or going straight to the regular homepage.  See example below:

googleskipintropic

If you were to click on the yellow area, you would be taken to the entrance page.  If you click on the light blue area, you bypass the entrance page and go right to the homepage. 

What I get from this is that Google realizes that most people don’t want to sit though a flash intro or view a splash page.  So, if you are thinking about adding a splash page, you may want to think about this and if it is worth your time if people will be skipping over it anyway.  Invest your time in adding good quality content and links to your site!

Google Cache
In the last few weeks, Google has changed the look of the information it is giving when you check you website’s cache.  In the past the Google Cache looked like this:

oldgooglecache

It is now displaying like this:

NewGoogleCache

The information is the same, but Google is now giving it with a cleaner look.  To learn more information about Google’s cache tools, you can click on the “learn more” button.

Posted in Google | No Comments » |

« Previous Entries