Big Problem or Minor Irritation: The Growing Percentage of “Not Provided” in Google Analytics

By: Daniel Jeffers - Search Engine Optimization - 2011

Around the end of October, beginning of November 2011 Google decided to “protect” the privacy of logged in users by concealing their search queries. Since search queries are one of the most important data elements for website owners, this created a lot of nervousness. But Google predicted that only about 10% of search traffic would be affected.

In the two years since the change, website owners have seen that percentage go much higher, many up around 40%, with some reporting as much as 60% of search traffic described as “not provided.”

Naturally, Google Analytics users are concerned, some have come up with complex strategies for getting at some of the data, while others have described available data as “useless.”

That’s ridiculous, of course. At worst, we are still looking at 40% of the search traffic. Play around with this random sample size calculator to see just how huge that is. But what we don’t know is whether there is something distinct about the “not provided” visitors. If so, then we’re losing insight into a huge segment of visitors. But if this group is fairly similar to the whole population, then our sample size is big enough that whatever keyword information we are getting can be effectively mapped to the website as a whole.

I took a look at five websites from a variety of sectors and of varying sizes. These included retail, non-profit commerce, health services, a public health agency, and a news/information site. For those want to see the data, I lay it out more thoroughly in this blog post.

Here is a description of the sites I looked at:

Basically, I looked at two questions:

  • Over three years, how much did the top keywords change for each website?
  • Comparing the “not provided” searches to all searches, how much did metrics such as time on website, pages/visit, % new visits, and bounce rate vary?

Changes in Top Keywords

Over time, I’ve noticed that with large, popular website, the most popular search phrases often remain stable. If there was a major difference between the “not provided” traffic and the known traffic, the top keyword list would likely see a change as well.

The top ten keywords for all sites remained strongly consistent over three years. The one site that showed the highest variability, with two top five terms dropping off the first page, was a news website with frequently updated content. Two websites did not have any of the top five terms drop off the front page.

On-Site Behavior

With one year’s data, most websites showed that the metrics for “not provided” were close, but not exactly the same as those for all search. One site in particular, the non-profit commerce site, showed higher variation in time on website and pages/visit of nearly 27%. Other websites showed far less variation.

My take is that, for most websites the currently available keyword data is an accurate representation of the search phrases people use to come to your website. Of course it never hurts to take a look at your own data and see if there is a bigger variation for some reason.