Earlier I posted that Google Analytics and other Javascript-based tracking tools might be undercounting visits from Twitter. I’ve done some more digging, which supports the case. In my test, Twitter seems to have sent 500% to 1600% more traffic than log files or hosted stats packages like Google Analytics might show.
How Twitter Might Send Far More Traffic Than You Think is my earlier article that explains how I’d often seen big gaps in how many people apparently clicked on a tweeted link as measured by Bit.ly versus how many page views that Google Analytics was showing.
To test this further, I tweeted a particular page on my personal blog along with tracking code designed to especially help ensure it appeared in Google Analytics. I’m going to toss out a bunch of numbers as part of this analysis. If they get confusing, skip to the end for the conclusion.
The Numbers Bit
For July 7, Bit.ly reported that the page had registered 58 clicks. Were there 58 corresponding page views? No. Google Analytics only reported 17 page views from 11 unique users. That meant a gap of 41 views.
Was the gap due to clicks from non-human robots that don’t process Google Analytics Javascript tracking code? Visits from people using mobile browsers that didn’t get tracked, because they might not process the code? To explore further, I went to the raw log files, the records that the server itself keeps. These shows any request made for the page, regardless of any Javascript issues.
I found that there had been 57 total requests — practically the same as Bit.ly reported. However, 14 of these were for the page without the tracking codes I’d used when tweeting the page through Bit.ly.
In other words, this is the URL I put out through Bit.ly, which was reported to receive 58 visits:
http:///mothers-cookies-closes-the-sadness-for-products-i-no-longer-have-389?utm_source=twitter&utm_medium=micro-blog&utm_campaign=twitterSee the part in bold? Those are tracking codes or parameters. From the log files, I found that URL above (with the codes) had 43 visits (not 58) and the same page without tracking codes like this received 14 further visits:
http:///mothers-cookies-closes-the-sadness-for-products-i-no-longer-have-389Those 14 visits without tracking codes almost all came from robots (Google: 5; Yanga: 4; Microsoft: 3). Two other visits seemed to be from humans. These robotic visits all likely had nothing to do with my tweet. The requests were from spiders doing their regular crawls of the web, it seems. The few human visits to the page without tracking codes were probably people who came to my article for reasons unconnected with the tweet.
What about those other 43 visits to the page that did have the tracking code? Well, 11 visits were from what appeared to be robots (OneRiot: 1; PycURL: 2; Ginxbot: 2; WebShot: 1; Google: 2; Tweetmeme: 1; Python-urllib: 1; LongURL API: 1).
That left 32 visits that appear to be from humans. That’s almost between the 58 views Bit.ly reported and the 17 page views Google Analytics reflected. Why still such a gap with GA?
One leading argument has been that some Twitter applications on mobile devices load pages within the application, rather than using an external browser, and so aren’t getting registered by Google Analytics. Also, some mobile browsers might not process Javascript. I could see at least four iPhone-based requests like this. But there were plenty of other requests that appear to be from full-fledged desktop-based browsers. Why weren’t they showing up?
One clue is that of the 34 requests, only 5 of them contained “referrer” data, information that some browsers pass on that indicate how they found the page in the first place. For Google Analytics (or ANY analytics program) to properly indicate how much traffic a particular site is driving, it needs as much referrer data as it can get.
Of those referrers, only 2 of them were from the twitter.com domain (1 more was from my own blog’s domain, 1 from iconfactory.com/twitterific, probably indicating a Twitterific users, and one from powertwitter.me, probably indicating a Twitter-visit via a Firefox plug-in).
In short, based on referrer traffic alone, ANY analysis program would have reported that at best, Twitter sent my page only 2 visits. Yet, both Google Analytics and Bit.ly reported that it received far more than that.
Remember, Google Analytics said the page had received 17 views in all, 11 from unique users. How many of those 11 unique users came to the page via Twitter? Google Analytics said 9. One more came directly, it said; another person did a search to find it (mother’s cookie site:daggle.com was the search, which was me locating the article. Oddly, this request does NOT appear in the raw log files).
The Big Conclusion
All those earlier numbers hurt your head? Here are the most meaningful ones. Thanks for hanging in there!
Based only on referrers, at best, Google or any analytics program would have said Twitter sent 2 visits. But because I used tracking codes, I was able to overcome the lack of referring data and see that Twitter (itself or via applications or web sites using Twitter data) sent 9 visits. That means analytics packages might be undercounting Twitter visits by nearly 500%.
Meanwhile, Bit.ly was showing those 58 clicks to the page. Let’s say it wasn’t filtering out some of the robots. I can still see that there are 32 visits that the log files recorded, all with the tracking codes that never existed until I tweeted the link with them. So those are all Twitter-derived visits. That means an undercount by a standard analytics tool depending on referrer data by 1600%.
And The Analytics Companies Say?
I sent my logs to both Bit.ly and Google, along with a draft of this article, for any reaction.
Google said they’re aware that activity on mobile devices can cause issues with tracking and that they’re looking for ways to improve their product.
Bit.ly said they filter out robotic clicks such as Ginxbot, Google, and Python-urllib, through PycURL. When I asked further about the gap, they emailed back:
It looks like three types of events make up the delta.
First, browser plug-ins and automated url-lengthener applications, which make requests to the bit.ly URL, but don’t follow the redirect to the destination site.One example is the “eventBox” at http://thecosmicmachine.com. Here’s how it appears in the logs:
(eventBox) : – - [07/Jul/2009:20:41:31 -0400] “GET /cHXSP HTTP/1.1″ 301 410 “-” “EventBox567 CFNetwork/438.12 Darwin/9.7.0 (i386) (iMac9%2C1)” 301
Second, small bots that make their way through our screening system:
(slicehost): – - [07/Jul/2009:21:05:43 -0400] “GET /cHXSP HTTP/1.1″ 301 410 “-” “-” 301
Third, browsers which don’t support Javascript, as well as browsers with Javascript settings turned off and browsers running Javascript-blocking extensions like noscript.
And Some Related Reading
Last week, Fred Wilson posted Does This Blog Get More Traffic From Google or Twitter?, finding that for his personal blog, Twitter traffic has risen past Google search traffic. Fred suspected that the Twitter traffic was even more than being shown, due to undercounting. I think he’s right. While I think Google search traffic still remains a major traffic driver for many sites, those who have lots of Twitter followers or have a story go “hot” through retweets certainly may discover Twitter is a new major traffic resource –and one that’s likely undercounted.
Over at the Zebu Blog, Link Tracking – (lies, damn lies &) Statistics? also looks at the issue, questioning whether Bit.ly is overcounting. In a follow up comment, Mayank Sharma did his own small scale experiment and found:
We created a bit.ly url for this post, and posted it on Twitter. The next instant we saw, that bit.ly�s count was already 4. This only means that some twitter crawler/indexer received the tweet and de-referenced the url mentioned in it. After that I hovered my mouse over the link shown in Twitterfox. Sure enough bit.ly�s count increased by one. We did this repeatedly from multiple desktop�s of several friends and the count just kept on increasing. Not one of these folks during this time had actually clicked on the link.I agree — Bit.ly seems to be overestimating views. But Google Analytics seems to be underestimating them, perhaps severely based on my small scale log analysis program. Using tracking codes occasionally is one way to get a reality check.
Finally, if you want to add tracking parameters for URLs you tweet, consider the Snip-n-Tag add-on for Firefox. I’ve been using it, and it makes adding these to URLs super easy.
No comments:
Post a Comment