In this post we’re going to discuss how to block these bots from your GA reports so that you can have clean data from which to make smart business decisions and make more money!
First let’s understand that “well behaved” bots usually have one of two fingerprints:
What kind of tracks does your bot traffic leave? Well it’s best to start with an Advanced Segment that begins to whittle away this bot traffic from the real traffic. Apply this segment to your Google Analytics data and you’ll be viewing only direct traffic that bounces.
Do you see any patterns in the browser version reports? (Audience>>Technology>>Browser & OS)
In the screen shot above we’ve selected “Mozilla Compatible Agent” and version 5.0 with no Java Support looks like a likely bot suspect.
What about identifying bots by ISP? (Audience>>Technology>>Network)
So from the data above we can see some interesting sources of bouncing traffic. Next I created another segment to view each of these ISP’s (1. microsoft corp , 6. yahoo! inc. and 16. Inktomi Corporation).
As you can see from the data above, all of the traffic from these ISP’s bounced.
Your next action is to decide if you want to go a step further and actually filter out this traffic from your analysis profiles.
If you decide to block traffic based on browser profile, then you’ll need to construct a series of filters to do this.
The first two combine browser data together with Java Support (yes or no) to allow you to then filter only the offending browser profile (Third filter listed in the image above) that is utilized by the bots.
If you decide that the ISP route is the way to go, then you’ll have a much easier path:
Where the pattern is equal to yahoo|microsoft corp$|inktomi
Either way, you should remember to:
In summary, it really does not matter why the bot is on your site, what’s important is that they are triggering nuisance pageviews that can skew your numbers and conversion rates. Are they affecting your GA data? Apply the segments above and find out for yourself!
In this post we’re going to discuss bots from Yahoo! and Microsoft, why it’s important and how to identify the traffic and see if it’s affecting your site.
Why is this important? Well as you’ll see, all of this bot traffic comes into your site as Direct traffic, has exactly one pageview and then does nothing, and that is the problem. We have to remember that a visit as just described equals a bounce — which is a bad thing. So as you look at your reports over time, you may wonder why your goal conversion rates or Ecommerce conversion rates from Direct traffic have plummeted while your bounce rates have increased. Part of the answer could very well be bots. And if you don’t account for this traffic in your quest for the analytics intelligence that will turn your site from a business cost to a profit center, you may never get there!
So how do you know if this traffic is affecting your site? Well by looking from 30,000 feet, you may never know — you have to dig deep. So if you haven’t already been digging for answers about your Direct traffic performance, let me walk you through how to identify these bots.
First of all we know that the focus area is Direct traffic that bounces; so the first step is to create an advanced segment to “filter” all of our reports for these visits in Google Analytics Reports.
If you’d like to “play along” as you read this post here is a link to the segment:
This is a view of the service providers in the Network sub-section of the Technology Report that have been the source of our Direct, bouncing traffic.
Why look here? Well there isn’t much to glean from other reports. Content reports are varied. Traffic Sources and Conversions we know, so the best place to try to find some answers is in the Audience section of Google Analytics. Any report here is a good starting place and in this case we can see from the screen shot above that we’ve gotten a lot of traffic directly from Microsoft and yahoo! inc.
So let’s take a closer look at this bouncing, Microsoft and Yahoo! traffic by applying the segments below.
So the data that points to bots here is pretty straight forward:
So let’s look closer at the Internet Explorer Traffic:
We can see that most visits are from IE7 and again with no Java Support. (Java Support by itself isn’t necessarily a “bot indicator”, there are other supporting traits that we don’t have the space to include or address in this post.)
While Yahoo! ignores IE, both companies are leveraging Mozilla Agents.
While Microsoft eschews Firefox 3.5:
So what does this all mean? First, it’s highly likely that Google, Microsoft and Yahoo! are using automation to explore websites and on the highest level, that it’s no longer safe to say that bots are not tracked in Google Analytics. As Analysts, Marketers or business owners, we all need to make sure we’re accounting for their presence as we explore analytics data.
In my next post I’ll share some strategies to filter out this traffic and more segments to help you remove the unwanted effects of this traffic.
I just love a good mystery, and to be candid, I love being the one who gets to solve it! Solving mysteries and putting together the proverbial pieces of the puzzle is a critical skill in the field of web analytics. You almost have to like the torture that comes with trying to figure out a problem, in a weird and demented way.
So when my industry colleague Matt asked me on Twitter to help him solve his Google Analytics quandary, I was ready in a nano-second.
You can read the full post here, but essentially, Matt needs to know what the best way to “isolate” page data would be. He has a sub-directory on his web site, which include pages, and needs to be able to create a segmented, sliced-up view(s) of that sub-directory, and needs to be able to view how each sub-directory’s pages are performing in relation to other sub-directory pages.
Creating a duplicate, filtered profile for all of this sub-directory’s traffic within the same Google Analytics account (using the same website domain) will create your isolated view of only those sub-directory pages. You will only see visits and page views that happened on those sub-directory pages. It’s good for looking at your sub-directory data in a silo, and you can compare the high-level data by using the profile overview screen (assuming you are planning on creating additional filtered profiles for the other sub-directories). You can also download the data offline and mash it up, either via the Google Analytics API or by simply downloading PDF or CSV files.
Creating an advanced segment that displays any pages that match your sub-directory name will show you any visits which included at least one page view on any one of the pages within that sub-directory. This definition – visits instead of pages from the previous paragraph – is an important differentiation. As commenter Amanda has already astutely observed, you will see other pages appear in your Content report section, because this segment will show you those other pages, as they were a part of these visitor’s sessions that viewed at least one page within your desired sub-directory. You can create an advanced segment for each sub-directory and compare up to three (plus the “All Visits” segment) at the same time, and get an on-the-fly look at your sub-directory data. However, if your date-range is long, you may encounter data-sampling (not the biggest issue in the world, but something to be aware of).
If you create a Custom Report, in your main profile and without any advanced segments applied, you will be tailoring an original view of your data. You can combine metrics from different reports, like visits, bounce rate, goal start and goal completion percentage, and revenue / ROI metrics (if you do Ecommerce). You can then match it up with the page dimension, and even set it up so that when you click on a page, the report will show you the keywords, or the source / medium combo, or the visitor country, or whatever drill-down dimension you want to see. Then, if you really want to get fancy, you can apply an advanced segment while you are looking at your custom report to show you visits that have viewed at least one of you desired sub-directory pages, and really get cooking! You can then apply a custom report and an advanced segment to multiple profiles from within the main profile (Click on the respective “manage” links), and apply it to any of the other profiles within your account.
So, what would I do? I would create a custom report with an advanced segment applied to it. You can also create a filtered profile if you wish, but I would suspect you would not use it as much as you would a custom report / advanced segment combo. I would also insist that your report is meaningful and that you can take action from it (e.g. knowing that a page’s $Index value is a lot lower than the site average would point you in that page’s direction to optimize / refine it). Pick metrics like Bounce Rate, $Index and Goal Conversion Rate that help you understand page performance, and ditch trivial ones like Avg. Time on Site or Exit Percentage.
Hope I helped out Matt and others in a similar situation!