In my last post I said that the total amount of data a web page needs to download directly correlates to the amount of time it takes for a page to be ready to be viewed by your visitors. I also said that the total amount of downloaded files a web page needs also affects performance. I left off talking about reducing the amount of files your web pages use by combining files. Often, combining files will take some strategy, and you’ll have to know which files a web page uses so that you know which files you can combine. We’ll use the Components tab of YSlow to get this list.
Remember that YSlow is an add-on to Firebug, which itself is an add-on to the Firefox web browser. After opening Firebug (to do this quickly, press the F12 key while you’re in Firefox), click the YSlow tab. The window underneath will say “Grade your web pages with YSlow.” Right above that line are the YSlow tabs: Grade, Components, Statistics and Tools. Click the Components tab. After a short wait, you will see a table of components, with the component types listed on the left side: doc, js, css, cssimage, image, favicon and redirect.
Since I recently combined a few JavaScript files that a particular site was using into one file, I’ll explain how to do that using the Components tab. You may know that you can load JavaScript files in your web page either in the header (inside the tag), in the HTML body (inside the tag) or both. In addition, it is possible to include JavaScript code directly in the head or in the body. This is called inline JavaScript code and is an extremely bad SEO practice usually. It’s a good idea to know what external (non-inline) JavaScript files your page uses. To do this, click the plus button next to the component type “js” in the left column. The URL column will tell you where the file is being loaded from. Now, take a look at the tag from your page. The best way to do that is to go to Firefox’s View menu and click Page Source. External JavaScript files will be in tags that have an attribute that looks like “src=javascriptfile.js”. Refer to the list of JavaScript files that YSlow’s component tab gave you, and pull out any that aren’t referenced in the tag. To combine the files in the resulting list, click the Tools tab, and click All JS. A new page will open up titled “JavaScript for: www.example.com.” Now open your text editor (Notepad will work well), and in the page that opened up in Firefox, and one at a time, copy only the code for the files in your list from the page you see and paste it into Notepad.
Then, save the notepad file with a name similar to all.js. Upload it to the same location as your other JavaScript files on your web site. Then modify the contents in the tag by removing all tags that refer to the files in your list except for one. With this remaining tag, update the src attribute to point to the all.js file you just uploaded.
Now, upload your web page with the modified tag. Using YSlow, ensure that the Components tab does not mention the files in your final JavaScript list. Also make sure that your all.js file is mentioned.
In my next post, I’ll talk about some tangible benefits from combining files that I’ve experienced first-hand. I’ll also review some of the basic concepts of performance optimization and go over some performance inhibiters that I may not have mentioned before, as well as highlight some other performance tools.
Fixing un-canonical URLs. Oh joy! Part 3
Welcome back to my series on fixing canonical URL issues. In my last post, Fixing un-canonical URLs. Oh joy! Part 2, I discussed how host names can present un-canonical URLs to the search engines and how to fix those problems. As a review, here again are the areas of canonical URL issues:
Let’s continue expanding on other potential culprits of un-canonical URLs.
Your web server is using a setting called ‘default index file’ or ‘default content file.’ This setting was used to allow your web site visitors to see a custom listing or ‘index’ of a directory’s contents by just knowing the name of the directory and without having to know the name of the file that shows this index. This setting is also part of the setup tasks for a new web hosting account and provides security in the case you to didn’t want to show all visitors a list of all files in that directory. This default index file is usually used as an introductory page to the contents in that section of your web site. (Default index files, depending on the kind of server your site is on, have names such as index.html, index.htm, home.html, default.htm, default.html, default.asp, default.aspx, index.php, index.cfm and so on.) With this setting in place, when a visitor goes to a URL that ends in a name of a directory on your web server (including the ‘root’ directory), a trailing forward slash and no file name (something like http://www.example.com/), the web server does not redirect anywhere, but shows the first file it can find in the list of default index files to the visitor. If the visitor types the same URL, but ends it with the name of the default indeed file (something like http://www.example.com/index.html), the web server will also not redirect, but will show the same exact content as without the file name.
Before trying to canonicalize a default index file to a forward slash (“/blog/index.php” -> “/blog/” for example), make sure that the content you expect to show does in fact show when a visitor leaves off the file name in the URL. Also make sure that the web server only responds with a 200 response code. After doing so, you can use one of several methods to 301 redirect “/blog/index.php” to “/blog/” or whatever your situation is. One method is using URL rewriting rules and regular expressions. This method generally provides a faster reaction time by your web server (read: less wait time for your visitors) and is probably a cleaner way compared to the other method, which is incorporating specific logic into your ‘include files.’ The logic of both methods is pretty simple: if the requested URL ends in a forward slash plus one of the default index file names, send a 301 response code and the location of the redirection to the browser. The redirection location will be originally-requested URL with the default index file removed, and ending in a forward slash plus any query strings and/or URL fragments.
When you created a new directory under blogs called Colors, you forgot your convention for naming directories was all lower-case. After creating the directory, you tested it to make sure your visitors would have no problem getting to the pages in that directory. You went to www.example.com/blog/colors and everything looked great. You didn’t realize you made a mistake until noticed in your traffic logs that many people were looking at a slightly different URL: www.example.com/blog/Colors. Most of the time, this is caused by having your site on a Windows-based server and not having anything in place (such as a CMS that is aware of this issue) that can handle this problem. Windows servers are case-insensitive; Linux and Unix servers are case-sensitive. If your site was running on a Linux server and a visitor browsed to www.example.com/blog/colors, they would probably get a Page Not Found error because the ‘colors’ directory doesn’t exist. Windows’ case insensitivity makes it easier for visitors to get to pages in your site if you’ve mixed upper and lower case letters in either your directory or file names (or both).
You can resolve canonical URL issues related to case insensitivity with a variety of methods. First, you can try the tools offered by Google’s Webmaster Tools to find these issues if you’re not sure if or where they might lurk in your site. You can use a simple rewrite rule to 301 redirect any case-variation of a particular URL to the canonical URL. You can use a CMS or blogging software that will automatically change your new page, category or tag name to all lower-case before that page, category or tag goes live on your site.
Also, it is important to know that the paths you enter in your robots.txt file are case-sensitive. If you mean to block access to www.example.com/dontgohere.php by adding “/DontGoHere.php” to your robots.txt file, www.example.com/dontgohere.php will not be blocked.
In this post, I will look a little deeper into using YSlow to optimize your web pages for speed.
Let’s start off with the Grade section. Usually, when you click the Grade tab, YSlow will quickly run through a few processes and show you a gray progress bar. It is collecting information, analyzing and grading your page’s performance. YSlow shows your overall grade in the top left, and by default, it will show all 22 metrics (in some order that I haven’t figured out yet) and their grade. Click each of the six sections on the top to show only metrics in those categories (e.g., server). The Grade tab is a great way to remember some things to check in analyzing your page’s performance. Remember, though, not all 22 metrics should be taken as hard and fast rules. It often depends on the type of your site and your specific situation.
One way to quickly see what is going on with your performance is to analyze how many HTTP requests (i.e. any request for any type of file your browser makes when displaying a web page) are occurring as well as how much data is being downloaded. The Statistics tab shows a nice overview of this information with pie charts to boot. There is obviously a direct correlation between the amount of data that is downloaded to the amount of time it takes to fully display your page. The Statistics tab shows the total data amount, or “weight,” that is downloaded as well as how many HTTP requests were needed. It breaks this into two helpful categories: Empty Cache and Primed Cache. Empty cache represents the situation where you have never been to that site before and you visit the page that is being analyzed for the first time. Technically, it means that browser does not have anything the page requests already stored in temporary memory. After making those requests, it stores what requested files it can into temporary memory so that the next time it needs to request those files, it can just pull them from the temporary storage, which is many times faster than requesting it over the Internet. Primed Cache represents the scenario when your browser has at least some of the requested files already stored in memory.
Next to each of the two pie charts, YSlow displays a categorized table of items it requested and the “weight” of all items in each category. The categories are: HTML/Text, JavaScript File, Stylesheet File, CSS Image, Image and Favicon. If either your empty or primed cache shows more than one request for a CSS image, these images may be good candidates for CSS image sprites. Image sprites is a technique used to reduce the amount of HTTP Requests by putting all multiple images in one larger image (like a pasteboard) and using the CSS background-position rule to only show the appropriate image at the appropriate area of your web page layout. How practical this may be depends on whether any of the images reside on other servers and whether any of them are 8-bit (256 color) images. If the files exist on other servers, you should ask yourself if you should combine them. One reason you may not combine them, even all images are full-color is that the image is often updated by some other website that resides on that 3rd party server.
Use similar logic for determining if you should combine the Javascript and Stylesheet files that your site uses. To be able to combine the files, you’ll have to know which files to combine. That is where the Components tab comes into play. I’ll cover that section in my next post. Until then, enjoy your break.
Today, I’ll introduce YSlow, an add-on for Firefox.
YSlow was born out of Yahoo’s development department after they wrote some best practices for making high performance web sites. All the gory details are found in their Best Practices for Speeding Up Your Web Site document.
You might know that Firefox has thousands of add-ons available to help you with all sorts of things, and to be accurate, YSlow is actually an add-on to the Firebug add-on for the Firefox web browser.
Screenshot of FireBug with YSlow installed.
YSlow has four sections: Grade (grading performance), Components (listing web page components), Statistics (statistic on web page components) and Tools.
In the Grade tab inside YSlow, you’ll see that the grading is done on 22 performance metrics, which are divided into six categories: content, cookies, CSS, images, Javascript, and server. YSlow provides descriptions for each metric and a link to the full description in its Best Practices document. YSlow understands that there are different types of web sites and grades according to the ruleset you choose (e.g. blog, small website or large web site). You can create custom rulesets as well.
The Components tab provides all components that YSlow detects your web page uses and lists very useful and pertinent information about them under five categories: doc (genrally, any HTML, XML or XHTML file), js (Javascript), css, cssimage (images that your CSS files request) and image. It will show any component in red if the browser cannot find it (404 error). It also gives the count for all components by category.
The Statistics tab compares the page loading (total file size, to be exact) when components were not cached (saved from previous visits) verses when they components were actually cached when you ran the YSlow test.
The Tools tab provides features that can make some optimizations much easier. For example, to implement suggestion 4 in my previous post, click the All CSS link to combine all CSS used by the web page. Clicking on the All JS link will do the same thing for JavaScript.
My next post will delve a little deeper into YSlow.