Articles written in May, 2010

May 25 2010

Measures of center, outliers, and averages


Let me take you back to the days when you were an under-21 college student, figuring out who you were and what you wanted to be when you finally grew up. For some of you this may be a lifetime ago, and for others, it may have seemed as if those days happened yesterday (literally, yesterday).

Most college students must take one, if not two courses in mathematics during their college careers, regardless of their degree program. Most of the time, elementary statistics is the course selected, probably because it’s the easiest math elective to take for most people. In short, lots of people have an elementary knowledge of statistics. So, why are average-oriented metrics put on such a pedestal?

In elementary statistics, you most likely learned about the four measures of center and about outliers. If you don’t remember, that’s OK, it’s probably been a long time since, or you probably weren’t a math person and wanted to forget everything you had learned as quickly as possible.

The four measures of center are mean, median, mode, and midrange.

Mean – The mean is what you know as the average. It is calculated by taking all of the values in a set and dividing them by the total number of values in that set. The mean is very sensitive to outliers (more on outliers in a little bit).

Example: The mean of 1, 3, 5, 5, 5, 7, and 29 is about 7.8571.

Median – The median is not the same thing as the mean, even though in popular parlance, the two terms are often used interchangeably. The median is the number that is in the middle of a data set that is organized from lowest to highest or from highest to lowest. The median doesn’t represent a true average, but is not as greatly affected by the presence of outliers as is the mean.

Example: The median of 1, 3, 5, 5, 5, 7, and 29 is 5 (the number in the middle).

Mode – The mode is the number that repeats most often in a data set. It’s seldom used in statistics as a reliable measure of center.

Example: The mode of 1, 3, 5, 5, 5, 7, and 29 is 5 (it repeats 3 times – the other values only appear one time each).

Midrange – The midrange is calculated by adding the highest and lowest values of a data set together, and dividing the sum by 2. The midrange is hardly ever used as a measure of center.

Example: The midrange of 1, 3, 5, 5, 5, 7, and 29 is 15 (29 + 1 = 30; 30 / 2 = 15).

With four different measures of center, I’ve been able to come up with four different correct calculations for an average. Each measure of center has its benefits and present different sensitivities to the presence of outliers. Depending on the set of data, the measure of center may lose strength and implied value because of how it is calculated and how it is used.

Outliers – Outliers are numbers in a data set that are either way bigger or way smaller than the other numbers in a data set.

Example: In the 1, 3, 5, 5, 5, 7, and 29 data set, the number 29 is an outlier because of how much greater it is than all of the other numbers in the set. 29 is the only number that doesn’t “fit” in this set.

What is the meaning of all of this?
The meaning of all of this is to take your averages (average order value, average conversion rate, average time on site, and others) with a tiny grain of salt. Use average-oriented metrics cautiously and with skeptical optimism, as the presence of a mere few outliers in your data can distort the figures and not provide a true representation of what is really happening.

Take this extreme example of the revenue of five separate orders placed on a web site:


Your “realistic” average order value here should be $5.67 (the four “normal” values added up and divided by four). But if we’re looking at a report from a web analytics tool, it would report the average order value as $115.32. Clearly, there is a massive difference between $5.67 and $115.32.

To obtain real insights that will help your web site and your organization, you’ll have to dive much deeper beyond the averages to really exact meaningful information and data. Know your measures of center and your outliers, so that you can decide if your averages are realistic representations of what’s happening on your web site.

Until next time, I will leave you with one of my favorite all-time quotes, which fits right into this topic. Think about it the next time you’re obsessing over averages:

“A statistician drowned while crossing a river that was on average six inches deep”.

May 11 2010

Do You Delete Your Cookies? Do You Delete ALL Your Cookies?


Depending on the research report that you’re reading, anywhere from 0.5% to as many as 20% of people on the internet are actively deleting their cookies. Cookies are small text files that store data about the web sites that you visit on your computer. Web analytics tools like Omniture SiteCatalyst and Google Analytics use first-party cookies to collect anonymous usage data about their visitors, so that web site owners can improve their sites and marketing efforts. Web sites featuring secured log-in areas also need to use cookies to remember who you are on your next visit, and web sites that you visit frequently like message boards need to use cookies to remember your site’s preferences and settings.

Cookies – for a long, long time – have gotten an unfair, bad rap. It’s so bad that users will actually go out of their way to delete these cookies off of their machines, even though new cookies will be set as soon as they visit virtually any web site on the world wide web. The reasons for deleting cookies are as varied as the ingredients in a New Orleans style jambalaya. Some say cookies take up too much space (they don’t, cookies never exceed four kilobytes, which is the equivalent to a grain of sand on a beach); that they infect your computer with viruses (they don’t, or the internet would be completely inaccessible, which is isn’t); or, that they are used to spy on your computer (most cookies can only be read by the site that sets them, and the domain [the URL of the site] is “hashed”, which means that it is encrypted with a numerical algorithm).

So, when folks delete their cookies and feel that their internet browsing experience is that much safer, are they really deleting ALL of their cookies? The answer is surprising: no, they are not. Flash cookies, which are set by flash applications, are not stored or viewable in the same places are the regular text cookies that folks have been deleting for all of these years. Because Flash is so prominent (installed on almost 99% of all computers), virtually everyone who has been online has at least one flash cookie installed on their computer, without even knowing it.

These flash cookies can store up to 100K of information, which is a bit more than 25 times what the regular browser cookie is allowed to hold.

Deleting your flash cookies can be done on your computer, but it’s a lot easier if you visit the Adobe Flash Player settings page, where you can find the Website Privacy Settings panel. Click on the little folder icon (which should be the last one of the right-hand side on the top row of icons) to view what sites have set flash cookies on your computer.

If you didn’t know that flash cookies existed, let alone know that you probably have some flash cookies set on your machine, then that is the greatest argument that I can make for not deleting your cookies. You wouldn’t have even known about flash cookies until you read this blog post, so how big of a part do cookies play in the grand scheme of things? Does what you don’t know hurt you?

So, do you delete ALL of your cookies? 🙂

© 2019 MoreVisibility. All rights reserved