Regular Expressions (also known as Regex in Google Analytics and Tag Manager) can help save time and include/ exclude data from your reports.
Regex is now supported in Google Data Studio. However, there are few changes to Regex that classifies it as RE2. RE2 was designed to be faster and more secure.
In this post I will provide you with a few Regex examples that we have created for Google Data Studio and a few differences we have noticed when compared to regular Regex.
Ever wondered how you can exclude data that Google Analytics collects from your own computer? Ever pondered over how to re-write the way data looks in your Google Analytics reports? Ever dream about changing all of your data from lowercase to uppercase? Well, you can do every one of those things and more by using filters to manipulate your profile’s data.
The catch is that you need to write your filters in a language called Regular Expressions. If you’re not familiar with regular expressions, you may want to review the blog post that I wrote over two years ago titled “Stuck Between a ^ and a $ place“. Once you familiarize yourself with regular expressions, you’ll be able to write filters to manipulate your Google Analytics data.
If you’re an administrator of your Google Analytics account, you can click on the “Edit” link next to your profile, and scroll down on the subsequent page to the section of your profile’s settings that shows your filters. This area may be blank, so click on “Add Filter” to start creating your filter.
First, give your filter a descriptive name, like, “Excluding my IP address”. Then, select your filter type, between a “Predefined Filter” or a “Custom Filter”. Personally, I like custom filters much better over the predefined ones, even to do simple filters that you can do with the predefined type – but that’s a topic of conversation for another day.
With custom filters, you can choose from a few different options, as you can see in the image below. You can exclude data, which removes data from appearing in your profile. You can include data, which will only include the data that you enter. You can change the case of a certain data point by either lowercasing or uppercasing it. You can search for a page, keyword, source, or other data point and change what it says with the search and replace filter. And, if you’re feeling adventurous, you can write an advanced filter to change the order of, insert data behind and in front of, and do many, many other fancy things.
If you find yourself looking for ideas or needing clarification on anything, you will find a help menu below the filter creation screen, as in this example for what you should do with a visitor IP address filter:
Before you start firing away with filters as bullets into your Google Analytics profile, do note that:
– You should definitely review my Stuck Between a ^ and a $ place blog post that describes what regular expressions are and how to work with them.
– The order of the filters matters. If, for example, the first filter listed in your profile excludes Yahoo data, your second filter won’t be able to find any Yahoo data to manipulate. You can change the filter order from the main website profile settings page.
– Filters cannot be applied to retroactive data. When you apply a filter to your profile, the data will be affected by that filter moving forward, not even one second before that moment.
– Filters take approximately 24 hours to propagate (e.g. to activate) in your profile.
We’ve been a blog for about six months now, and I’ve just realized something – I don’t think I’ve ever once talked about Regular Expressions! I know I’ve talked about Filters before, but I’ve never written about the language that they are created in. Shame on me! (Robbin Steif, whom I consider the authority on using Regular Expressions in Google Analytics, is probably understandably beside herself that this much time has gone by on this blog without mention of Regular Expressions. I’m sorry!)
Whoa…wait a minute…what do you mean “filters”…like, coffee filters?
Not quite. Filters are these neat options that you can enable in any one of your profiles in Google Analytics. They can allow you to block an IP address or a certain domain’s data from appearing in your reports. They can allow you to change the appearance of data in Google Analytics, and, if you’re creative enough, a whole bunch of other things.
So these “regular expressions” – is this how you create filters in Google Analytics?
POSIX Regular Expressions is the language that Google Analytics requires you to use to create filters. You also need to use Regular Expression language if you create a Goal using Regular Expression Match, or, if you want to use the filter tool at the bottom of any report table in Google Analytics to either view a group of items, or to find exactly a part of a word in a page or source.
There are about a dozen different characters that you can use to create filters and regular expression matched goals in Google Analytics, and to use with the filter tool toward the bottom of every report. You’ve probably seen them floating around the Google Analytics interface or somewhere online. Here are those symbols, what they mean, and how they are used. I promise to keep the geek (tech) talk to a bare minimum.
First, symbols that are known as “Wildcards” (because, like Jokers, they are WILD! :))
. – Yep, that’s right, a dot symbol is a character in regular expression language. It’s actually quite a powerful little fella – its definition is “match any”. So, for example, let’s say that in the filter tool at the bottom of your Keywords report table in GA, you enter in .ight and hit “go”. The report would show you any of the following keywords: sight, night, might, fight, and right. The dot symbol just says “Hey, give me anything that has “ight” as its next four characters, and I’m good to go!”. However, it would not match midnight, cat fight, or knight, as those three keywords do not have the same total number of characters (five) as .ight does.
* – The asterisk symbol is defined as “match zero or more of the previous items”. For example, if you typed mo*re into the GA filter tool and hit “Go”, you’d get back terms like moore (someone’s last name), mre (Meals Ready to Eat), and even if someone mistyped and used mooore (with three o’s). The asterisk matches all possible combinations of, in this case, the letter “o”, and it matches it zero or more times.
+ – The plus-sign symbol is almost the same as the asterisk symbol, with one exception – it needs one or more of the previous items. So, using mo+re in the filter tool would match more, moore, mooore, but NOT mre, like our asterisk symbol would.
? – Even pickier than the dot, asterisk, or plus-sign symbol, the question mark symbol only matches zero or one of the previous items. So, if you typed in moo?re, you would ONLY get moore and more in return. You would not get moooore or mre.
| – The pipe symbol is like the friend that always provides an alternate possibility to any possible situation. The pipe symbol basically means “and / or”. Typing in google|yahoo in the bottom filter tool of GA reports will bring up any traffic sources from google and / or yahoo. You can use it multiple times if you also choose, like apples|bananas|grapes|pears|peaches.
These next two symbols are called “Anchors””:
^ – This carat symbol matches anything that starts with your search term or search string. So, typing in ^/products (with the carat symbol as the first symbol in the string like my example) will match things like /products/toys.html, /products/cars.html, and /products/shirts.html. It would NOT match things like /category/products/index.html or /sub-folder/category/products.
$ – This dollar-sign symbol matches anything that ends with your search term or search string. So, typing in /products.html$ (with the dollar-sign symbol as the last symbol in the string like my example) will match things like /category/products.html and /directory/category/products.html. It would NOT match something like /products.html?id=123456. Oh, and don’t worry about that symbol for now – keep reading and I’ll explain.
These next few symbols are called “Grouping” symbols:
() – The parenthesis symbols creates an item, and is mostly used with our friend from above, the pipe symbol, which as you now know means “and / or”. For example, typing in (what|who|wher)ever will return whatever, whoever, and wherever back to you. Another example: astro(naut|logy|s) will return astronaut, astrology, and astros to you.
 – The brackets symbols creates a list of items. So, typing  will match anything that has a 1, 2, 3, 4, 5, or 6 in it. You should know that each character represents a different item in the list, so don’t use brackets for something like [google] – use the parenthesis instead.
– This dash symbol can be used in conjunction with the brackets symbols. It creates a range in a list. So, instead of having to type , you can type [1-6], and it will match anything between 1 and 6. Something like [1-689] will match anything between 1 and 6, plus the numbers 8 and 9.
Finally, one very important symbol remains:
– The forward slash symbol stands for “escape”. Placing a forward slash in front of any one of the characters that we’ve talked about so far tells Google Analytics to treat that character like a normal character, and not like a regular expression symbol. This is extremely important for those of you matching goals or writing filters of your own – insert a forward slash symbol in front of any regular expression character that you have.
Using Multiple Regular Expression Characters at the same time
This is where regular expressions can really be used to your advantage. For example, let’s say you are in your keywords report, and you ONLY want to match the word more, and not morevisibility or nevermore. Using both anchor symbols – ^ and $, in front of and behind the word more, like ^more$ – will return exactly more, and nothing else. So, the title of this blog is a play on words, as I’m always trying to find something between the ^ and the $ symbol in my Google Analytics reports.
Another example that is used very frequently when using multiple regular expression characters at the same time is when people create filters to exclude data from multiple IP addresses. For example, let’s say two IP addresses from your office are 192.168.25.33 and 188.8.131.52. Your regular expression would look like this:
It looks like that because:
1. I am putting a symbol in front of every dot symbol, so that Google Analytics knows that the dot symbols in the IP address are part of the IP address, and not the regular expression dot symbol,
2. ^ and $ symbols surrounding each IP address, so that Google Analytics will match that exact IP address, and
3. A | symbol, which stands for “and / or”, so Google Analytics knows to match either one IP address and / or the other IP address.
Ready for a crazy-looking regular expression?
There is much more about regular expressions that I could write about. However, take what you have learned so far about Regular Expressions, set aside some time and play with the filter tool at the bottom of GA Reports (a great way to get comfortable using regular expressions and how they work), and when you’re ready, take a stab a figuring out what this regular expression does:
I’ll tell you all about it in my next blog post.