We’ve been a blog for about six months now, and I’ve just realized something – I don’t think I’ve ever once talked about Regular Expressions! I know I’ve talked about Filters before, but I’ve never written about the language that they are created in. Shame on me! (Robbin Steif, whom I consider the authority on using Regular Expressions in Google Analytics, is probably understandably beside herself that this much time has gone by on this blog without mention of Regular Expressions. I’m sorry!)
Whoa…wait a minute…what do you mean “filters”…like, coffee filters?
Not quite. Filters are these neat options that you can enable in any one of your profiles in Google Analytics. They can allow you to block an IP address or a certain domain’s data from appearing in your reports. They can allow you to change the appearance of data in Google Analytics, and, if you’re creative enough, a whole bunch of other things.
So these “regular expressions” – is this how you create filters in Google Analytics?
POSIX Regular Expressions is the language that Google Analytics requires you to use to create filters. You also need to use Regular Expression language if you create a Goal using Regular Expression Match, or, if you want to use the filter tool at the bottom of any report table in Google Analytics to either view a group of items, or to find exactly a part of a word in a page or source.
There are about a dozen different characters that you can use to create filters and regular expression matched goals in Google Analytics, and to use with the filter tool toward the bottom of every report. You’ve probably seen them floating around the Google Analytics interface or somewhere online. Here are those symbols, what they mean, and how they are used. I promise to keep the geek (tech) talk to a bare minimum.
First, symbols that are known as “Wildcards” (because, like Jokers, they are WILD! :))
. – Yep, that’s right, a dot symbol is a character in regular expression language. It’s actually quite a powerful little fella – its definition is “match any”. So, for example, let’s say that in the filter tool at the bottom of your Keywords report table in GA, you enter in .ight and hit “go”. The report would show you any of the following keywords: sight, night, might, fight, and right. The dot symbol just says “Hey, give me anything that has “ight” as its next four characters, and I’m good to go!”. However, it would not match midnight, cat fight, or knight, as those three keywords do not have the same total number of characters (five) as .ight does.
* – The asterisk symbol is defined as “match zero or more of the previous items”. For example, if you typed mo*re into the GA filter tool and hit “Go”, you’d get back terms like moore (someone’s last name), mre (Meals Ready to Eat), and even if someone mistyped and used mooore (with three o’s). The asterisk matches all possible combinations of, in this case, the letter “o”, and it matches it zero or more times.
+ – The plus-sign symbol is almost the same as the asterisk symbol, with one exception – it needs one or more of the previous items. So, using mo+re in the filter tool would match more, moore, mooore, but NOT mre, like our asterisk symbol would.
? – Even pickier than the dot, asterisk, or plus-sign symbol, the question mark symbol only matches zero or one of the previous items. So, if you typed in moo?re, you would ONLY get moore and more in return. You would not get moooore or mre.
| – The pipe symbol is like the friend that always provides an alternate possibility to any possible situation. The pipe symbol basically means “and / or”. Typing in google|yahoo in the bottom filter tool of GA reports will bring up any traffic sources from google and / or yahoo. You can use it multiple times if you also choose, like apples|bananas|grapes|pears|peaches.
These next two symbols are called “Anchors””:
^ – This carat symbol matches anything that starts with your search term or search string. So, typing in ^/products (with the carat symbol as the first symbol in the string like my example) will match things like /products/toys.html, /products/cars.html, and /products/shirts.html. It would NOT match things like /category/products/index.html or /sub-folder/category/products.
$ – This dollar-sign symbol matches anything that ends with your search term or search string. So, typing in /products.html$ (with the dollar-sign symbol as the last symbol in the string like my example) will match things like /category/products.html and /directory/category/products.html. It would NOT match something like /products.html?id=123456. Oh, and don’t worry about that symbol for now – keep reading and I’ll explain.
These next few symbols are called “Grouping” symbols:
() – The parenthesis symbols creates an item, and is mostly used with our friend from above, the pipe symbol, which as you now know means “and / or”. For example, typing in (what|who|wher)ever will return whatever, whoever, and wherever back to you. Another example: astro(naut|logy|s) will return astronaut, astrology, and astros to you.
 – The brackets symbols creates a list of items. So, typing  will match anything that has a 1, 2, 3, 4, 5, or 6 in it. You should know that each character represents a different item in the list, so don’t use brackets for something like [google] – use the parenthesis instead.
– This dash symbol can be used in conjunction with the brackets symbols. It creates a range in a list. So, instead of having to type , you can type [1-6], and it will match anything between 1 and 6. Something like [1-689] will match anything between 1 and 6, plus the numbers 8 and 9.
Finally, one very important symbol remains:
– The forward slash symbol stands for “escape”. Placing a forward slash in front of any one of the characters that we’ve talked about so far tells Google Analytics to treat that character like a normal character, and not like a regular expression symbol. This is extremely important for those of you matching goals or writing filters of your own – insert a forward slash symbol in front of any regular expression character that you have.
Using Multiple Regular Expression Characters at the same time
This is where regular expressions can really be used to your advantage. For example, let’s say you are in your keywords report, and you ONLY want to match the word more, and not morevisibility or nevermore. Using both anchor symbols – ^ and $, in front of and behind the word more, like ^more$ – will return exactly more, and nothing else. So, the title of this blog is a play on words, as I’m always trying to find something between the ^ and the $ symbol in my Google Analytics reports.
Another example that is used very frequently when using multiple regular expression characters at the same time is when people create filters to exclude data from multiple IP addresses. For example, let’s say two IP addresses from your office are 192.168.25.33 and 18.104.22.168. Your regular expression would look like this:
It looks like that because:
1. I am putting a symbol in front of every dot symbol, so that Google Analytics knows that the dot symbols in the IP address are part of the IP address, and not the regular expression dot symbol,
2. ^ and $ symbols surrounding each IP address, so that Google Analytics will match that exact IP address, and
3. A | symbol, which stands for “and / or”, so Google Analytics knows to match either one IP address and / or the other IP address.
Ready for a crazy-looking regular expression?
There is much more about regular expressions that I could write about. However, take what you have learned so far about Regular Expressions, set aside some time and play with the filter tool at the bottom of GA Reports (a great way to get comfortable using regular expressions and how they work), and when you’re ready, take a stab a figuring out what this regular expression does:
I’ll tell you all about it in my next blog post.