These are just a couple of the questions that have been plaguing industries and enterprises worldwide since the “Big Data” phenomenon surfaced. By now, most of us have heard this buzzword/phrase that has been penetrating the minds of IT and analytics professionals alike. However, many organizations are still unsure how to effectively analyze and gain new insights from it. Luckily, there are expert specialists in this field who are eager to join and guide them through their journey.
What is “Big Data?”
I’ll spare you the formal definition and put it simply: “Big Data” is everything, and it’s everywhere. “Big Data” is defined by (at least) three ‘Vs’: Volume, Velocity and Variety. And you might even hear about a fourth ‘V’ depending on which “Big Data” solution provider you’re talking to.
- zettabytes = as much information as there are grains of sands on all the world’s beaches
- Veracity (IBM) — Accurate, truthful and trustworthy data
- Variability (SAS) — Data flows that may be unpredictable, inconsistent and anomalous
Now that we have a better grasp of what exactly “Big Data” is, I’d like to explore some of the complexities and challenges companies face because of it, as well, as the opportunities it presents.
Challenges & Complexities
The size, requirements, boundaries and resources of an organization, as well as the industry it’s in, can dictate the adoption of “Big Data” in addition to which obstacles will prevent them from extracting high-value impact and gaining new business insights that were previously unattainable.
However, there are a few common challenges despite the nature of the business:
- An abundance and variety of data sources and the information collected
- Inherent complexity in processing, management and aggregation
I intentionally left out a fundamental part of the “Big Data” definition when I talked about the three or four ‘Vs’ of this concept, but this is a perfect place to sneak it in.
IDC’s definition of “Big Data” embraces the hardware, services and software that integrate, organize, manage, analyze and present the data that is characterized by the ‘Vs’ discussed at the beginning of this post.
This is why new technologies and architectures, advanced tools and platforms are needed and are continuing to be developed. These appliances will allow enterprises to leverage “Big Data” and (you guessed it) analytics.
- Technical: Data scientists with an unparalleled level of skill to understand the interactions of a new class of technologies
- Analytics: Data mining; statistics; business analytics; problem solving; creativity
Although there are some hindrances to enterprises fully embracing this new era of “Big Data” and analytics, there are evolving approaches to conquer them. For example, the Google Analytics Premium and BigQuery integration that will be taking place toward the end of this year was just announced at the Google I/O a couple weeks ago. If you’re a GA Premium user, I’ll venture to guess that this made you smile — even if you’re not 100% sure what it’s going to mean for your business.
Check back next week when I’ll discuss what value, advantages, opportunities and possible use cases can arise from utilizing more advanced technologies, solutions, and analytics strategies such as the “Big Data” movement. Stay tuned!
What is Data Sampling?
In Google Analytics it means selecting a subset of data from your website traffic.
Why is this done?
The idea is that using a subset of data will provide comparable results to using the full amount of data available. Using a smaller data set will speed up the process for reporting, as pulling larger amounts data slows down queries.
When will I see Sampled Data?
When a report collects data from a large data set, over 500,000 visits, visitors or pages, you will see that the data collected is sampled. While running Multi-Channel Funnel reports, sampled data will be used when you have over 1 million conversion paths.
When running reports in Google Analytics you may see a yellow box at the top of the report which says:
This gives you specifics on the percentage of visits that the report samples from. As you can see, the average of visits is a little over 210,000, but the percentage of visits is lower in each instance, based on the amount of data each report has to sample from. The larger the amount of data pulled, the lower the percentage of visits that will be sampled into the report.
A new feature in Google Analytics is the Adjust Sample Size tool. The slider, which is located below the date range, allows the user to choose between faster processing and higher precision.
This tool will allow you to adjust the sample size from the default of 250,000 (which is the center of the slider) up to 500,000 visits. As you will see in the samples below, the data samples differ depending on where the slider is placed. The c can be placed anywhere across the path, not just on either end or the middle of the tool.
When you choose a sampling threshold, that preference will be used in all reports until you close Google Analytics.
Google’s New Change to Keyword Search will Impact Data for Analytics
Thursday October 27, 2011
MoreVisibility’s Twitterchat (#MVCHAT) took place yesterday, October 27 discussing the important new announcement from Google, regarding a change in the way keyword search data will now be gathered. The topics discussed included:
MVCHAT is a weekly 30 minute discussion starting at 3:30 pm (EST) covering a variety of online marketing topics. Clients, advertisers, and online marketing enthusiasts are invited to participate in this rapid-fire conversation by following and including #MVCHAT in tweets. Read more about #MVCHAT in the news here.