Sampling on Google Analytics – Where Did My Data Go?

Jason Brewster - November 12, 2015

Sampling in Google Analytics is important to Google; the complex queries of dimensions, metrics, and segmentation across potentially millions of sessions that are contained within Google Analytics are very computation intensive. In order to throttle the legwork that the service provides without limiting users ability to analyze overall trends, Google uses sampling.

How Does Sampling Work?

Sampling is similar to polling. There is an algorithm that looks at the data, then throws out a portion of the data – this is indicated in the interface by a call out in the upper right hand corner of the interface.

sampling example

A Sampling Example

To examine how much data is lost and how that can affect reporting, we used a standard report (which is never sampled) and a custom segment that matches all pages. Both of these reports look at the entire audience of the site – but the custom segment gets sampled so you can see where the data loss occurs.

In this case, the segment utilizes 69.6% of all sessions, but the high level metrics are remarkably similar.

Report TypePageviewsUnique PageviewsAvg. Time on PageEntrancesBounce Rate% ExitPage Value
Sampled1,219,940975,1360:01:51669,41882.68%54.63%$2.49
Unsampled1,245,6411,007,9060:01:48678,96282.72%54.51%$2.38
Difference2.11%3.36%-2.70%1.43%0.05%-0.22%-4.42%

The real problems start occurring when the analyst is interested in the row-level data below the summary.

Reports Rows of Data
Pages in Report (Sampled)105286
Pages in Report (Unsampled)138352
Loss of rows-23.90%

The above table shows a 24% loss in row-level data (in this case page URIs). In other words, almost 1 in 4 pages will be missing from your data. Missing a quarter of all pages viewed can make it incredibly difficult to evaluate the importance of “needle in the haystack” analysis. This problem also affects key performance metrics like goals and details associated with the goals – such as Goal Completion Location.

Sampling Limits

If the number of sessions in the property over the given date range exceeds 500k sessions, then sampling will be activated. In the case of Google Analytics Premium – sampling will be activated at 25 million sessions at the view level. This is important because it allows for strategic views to be created that will allow sampling to be avoided by breaking websites up into large pieces. Sampling occurs in both the Google Analytics website and the core reporting API in the same way.

Avoidance of Sampling

There are two ways to avoid sampling for Google Analytics. The first, no money required, method is to pull multiple reports using metrics that aggregate counts like sessions, bounces, total seconds – then using a spreadsheet or dashboarding software to calculate performance metrics and rates.

The second way to avoid sampling is to get Google Analytics Premium, which raises the ceiling for sampling and allows users to request un-sampled reports.

So remember to check for sampling on your data sets, and that there are simple workarounds for this difficult problem.

© 2023 MoreVisibility. All rights reserved.