Correlation, Causation & Coincidence in SEO

A recent blog post on Moz by Cyrus Shepard caused quite a stir in the SEO community. This stir was rooted in part over lack of understanding of the difference between correlation and causation, and in part because the author clearly tried to bridge the gap himself in his writing and imply causation where there was no evidence of such.

Before we dig into that, as well as a couple more examples, let's first get a better understanding of correlation, causation, and – for good measure – coincidence.

Merriam-Webster defines them each as:

Correlation: a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone.Causation: the act or process of causing.Coincidence: the occurrence of events that happen at the same time by accident but seem to have some connection.

The difference then is that correlation doesn't make the claim that one event causes the other, just that they occur together statistically in a way that wouldn't be expected based on random chance. One can view this as similar to consistent coincidence.

Causation, on the other hand, claims that two or more events are tied together directly. And coincidence, as we are all likely aware, occurs when two events happen at the same time but aren't at all related.

Let's put this into real-world examples.

Correlation: If you eat three square meals every day promptly at 8 a.m., 12:30 p.m., and 6 p.m., there will be a sizable period of time twice per year where your dinner time will correlate to the sun setting. An outside observer for this fixed duration may easily claim that like Pavlov's dog, your hunger for dinner is caused by the setting of the sun. Obviously this isn't true, but for this period the two events correlate.Causation: If you're walking down the street, texting all the way and walk face-first into a lamp post, you will get a bruise. While obviously texting doesn't cause facial bruises (though in this instance there is a correlation), the event of striking one's face against a hard object is the direct cause of the bruise. Thus, this is an example of causation.Coincidence: If you're sitting in a coffee shop and say hello to your friend and at exact the same time someone's phone rings, this is a coincidence. The mere sound of your voice doesn't inspire the ringing of phones and statistically one wouldn't expect the event to occur together outside of random chance.

It's very important to understand and remember the difference between the three and to question data based on an understanding of this difference. In fact, below I've included a link to an article on "spurious correlation" (which the meal-time situation noted above is an example of), but for now these definitions will work well.

With this in mind, let's look at the claims made in the Moz article noted above and explore some other examples that you likely have (or will) encounter while monitoring your rankings.

Google +1s and Search Rankings

To begin, you may want to read the article I'm referencing which can be found here.

Now, credit where it's due. The article is properly titled, "Amazing Correlation Between Google +1s and Higher Search Rankings".

With this wording as the use of the word "correlation" throughout the piece, the author acknowledges that there is no evidence or testing done to prove that +1s directly impact rankings, but that pages with higher numbers of +1s tend to rank higher. However, the whole article isn't on +1s so let's go through the points one-by-one that the author discusses.

1. Posts are Crawled and Indexed Almost Immediately

The author's assertion here is that content will get indexed faster if it's posted and shared in Google+. I have seen no large-scale tests of the speed with which content gets indexed by being linked to on standard Google+ profile vs. just being part of a highly crawled site; however there is evidence that a page can get crawled quickly via a strong Google+ profile.

Author's Claim: Accurate, but perhaps too optimistic.

2. Google+ Posts Pass Link Equity

The author claims that shared links pass link weight simply because they're not nofollowed (whereas other links are). Now, this brings up an interesting question: Does the fact that Google nofollows some links necessarily indicate that they pass weight to the others?

One could ask, "Why nofollow some if you aren't going to pass weight to any?" More likely than passing link weight from the easily abused environment that would breed goes back to point one – they will crawl the content that is shared (i.e., followed) and not crawl additional links, thus seriously restricting the benefits of comment spamming on stronger profiles.

I can't say the conclusion that the links are nofollowed just to pass crawlers and not link juice is heavily tested or based on more than an understanding of what Google's trying to accomplish and the pitfalls if they started passing link weight through Google+, but I will assert that it's far more likely than Google setting themselves up to be a link spam property.

Author's Claim: Unlikely

3. Google+ is Optimized for Semantic Relevance

The author claims that the ability to essentially write full blog posts into Google+ adds semantic relevancy to a URL shared by the post. It's true that Google has gone to lengths to ensure that the post page is unique and optimized. It's almost as if Google would like to rank its own site for the posts it contains. That part isn't to be debated, of course.

The real question is: does Google assign relevancy from a Google+ post to the URL shared in it?

If we think about what Google would be trying to accomplish, knowing that they do use Google+ for indexing, it makes sense that Google would use the same technology they use on their general relevancy analysis internally. Now, does Google use that to credit the target URL or do they use it to assign relevancy to their own post? That's a different question, and one which hasn't yet been answered openly (and likely never will, but if Matt Cutts would like to voice his thoughts please consider this the invitation).

Because the task is simple (make sure the description of a link you're sharing is accurate and contains a summary of the content) and the only pitfall will be that it passes no semantic weight to the target URL but does result in a better optimized post, I would add this to the "do it either way" list. It's not going to hurt, it may help – and even if it doesn't help directly, it may result in higher click-throughs and even your Google+ post ranking.

Author's Claim: Possible

Other Notes

The author goes from there to discuss ways to optimize a Google+ profile. One thing is certain, having a Google+ profile and using it is a good idea.

Whether you find Google+ of a high direct value or you simple have it in a "Google said to drink the Kool-Aid so I did" kind of way, more trust signals are being added to the Google web of services and the more trust you can send, the more trustable you are. The advice the author gives (especially in regards to the authorship tag which will directly boost the trust of your profile) is solid.

Now, you may be asking at this point, "The title mentioned +1s but you haven't touched on those. Why?" Interestingly the article itself doesn't cover much about +1s.

The author asserts that, "the relationship between +1s and higher rankings goes beyond correlation into the territory of actual causation," but retracted that with the next sentence added after publishing, "This should say 'posting on Google+' instead of Google +1s. It's clear that Google doesn't use the raw number of +1s directly in its search algorithm, but Google+ posts have SEO benefits unlike other social platforms," which leaves one scratching their head. The title and the image used right at the top to assert its accuracy is based on +1s and yet we're now to learn that it was never intended to be about +1s?

I find that unlikely, and perhaps a response to Cutts coming out and stating, "If you make compelling content, people will link to it, like it, share it on Facebook, +1 it, etc. But that doesn't mean that Google is using those signals in our ranking." There's actually much more he said which you can read here but that's the gist. The article, it seems, is incorrect in the implied assertion that +1s aid in higher rankings.

What Does This Have to do With Causation, Correlation & Coincidence?

At this point, you may be wondering why I started this piece with an explanation of causation, correlation, and coincidence. Throughout the article we had to put on our thinking caps and make this assessment whether we knew we were doing it.

At its core, the question was, "Do +1s improve rankings?" Ignoring Cutts coming out and saying "no," we have to address the probable reality which would be that a strong web presence and brand are going to attract more +1s.

Similarly, a strong web presence and brand are going to attract higher rankings and more links. Did the +1s lead to higher rankings? No. The strong web presence leads to both. This is a correlation, not causation.

What Are Some Other Examples?

Whether we know it or not, we make this assessment often; sometimes correctly, sometimes incorrectly. With SEO we often have to play probabilities and go with the most likely scenario in any given environment, so let's look at a couple other examples.

You Changed Your Title Tag and Your Site Dropped the Next Day

I hear this quite often (though you can substitute H1 tag, description, content, etc.). Rarely do such changes impact rankings that quickly.

The first place to look is to enter the URL into Google and see what they're presenting to visitors. Is the title the new or old one? If it's the old title then your page hasn't yet been cached, if it's the new then it has. The conclusion would be different for each event.

1. Your Title Hasn't Changed

If the change hasn't been cached, then the probability of it impacting the results is extremely low if any.

This is an example of coincidence as opposed to either correlation or causation. It's important to know this as not knowing will delay any efforts to addressing the real cause of the decline.

Rather than spending time undoing changes and waiting, praying, and wondering why it's not working, you'll want to look for other changes that took place; updates, warnings or penalties, etc.

2. Your Title Has Changed

In this event, the change may indeed have impacted the results, but before assuming causation you'll need to investigate other possibilities.

For example, if the environment was exactly the same as in the situation above (i.e., title hadn't changed) with the sole difference being that the crawlers were working hard and the page got cached, the same drop would occur and you may mistakenly draw the conclusion that it was due to the title adjustment.

This is perhaps the worst-case scenario as there is a clear and obvious culprit, albeit incorrect, and without questioning whether it too may be a coincidence, you may spend time and effort directed at correcting the wrong issue.

Here we have to add the title to the list of possibilities, but not ignore everything else.

You Changed Your Title and Traffic Grew

First, congratulations. Traffic is a much easier factor to look at, as there are far fewer variables.

If you rank for phrase X and your rankings stay the same, you'd expect to see the same traffic. If the traffic goes up or down after changing a title or description (remembering that it will have no click-through impact until Google caches it and begins displaying it in the SERPs) then one may (and most likely will) jump to the conclusion that this is an example of causation. That a title of format A will yield an improvement in traffic of B, this may well be the case but, as with the title example above, other factors need to be considered.

Some other questions you will need to ask are:

Is this new traffic to the same page/source?

I've unfortunately had to inform people that the spike they saw was from a different source of traffic when they mistakenly assumed an increase in traffic overall meant that their Google traffic had improved.

To know whether we're dealing with causation we need to look only at a specific set of traffic (example – traffic to that specific page and only from Google) to know whether the two are tied together.

Remember, traffic to other pages doesn't count.

Are you measuring the right timeframe?

Remember that most sites have weekly, monthly, and annual trends.

If you notice a jump in traffic two days after a new title went live you can't compare those two days vs. the 2 days prior. That may well have you comparing Monday and Tuesday with Saturday and Sunday.

The simplest comparison is to wait a week and compare full weeks of data. But assuming you don't have the patience for that (it's OK – I rarely do either) you can compare with the same days the week prior (assuming no special circumstances such as holidays or ranking changes).

Even this isn't ideal. I prefer to compare with the same days the year prior if possible, but this requires the rankings to have held (unlikely) and for you to have a solid grasp of the year over year traffic trends in your sector (i.e. if search volume is up or down overall you may see false positives or negatives based not on the title, but on overall search trends).

Assuming that you're comparing things correctly, you can now assume causation and apply similar changes elsewhere.

More

I came across an interesting piece in my travels that covers a lot of SEO well (including the +1 discussion) and that is a piece on "spurious correlation" that can be found on the Everyday Sociology Blog.

I've used the term correlation loosely in this article; however, what we're talking about in the +1 example is spurious correlation, which is a situation that isn't at all related, but changes at the same points. An example drawn from the article:

"One student had gone out partying the weekend before, and while sitting in the bar watching his friends during the evening, he noticed that people who had the most fun dancing were also those who were most likely to throw up by the end of the evening. It's not that dancing made them sick ("A" causes "B"), or that being sick make them have fun dancing ("B" causes "A"), rather there is a third variable, alcohol consumption ("C") that leads to both fun dancing and sickness."

To that end, I would task each and every one of us to sincerely ponder correlation, causation, and even coincidence with each assumption about SEO we make. At best, it will save you time and energy; at worst, it'll force you to fully understand all the angles of a situation before tackling it.

Bringing Together Paid, Owned and Earned Media
Sept. 10-13, 2013: With a newly announced, completely renovated agenda,
SES San Francisco could be the most valuable online marketing conference you attend this year. Register today and save up to $200!
*Pre-show rate through September 6.

No comments:

Post a Comment