How to construct a valid SEO test
Posted by Michael Martinez on October 15, 2007 in SEO Theory
Every now and then I read about an SEO test some blogger has conducted. SEO bloggers who announce tests usually don’t say much about how they devised their tests or what they were testing. To be scientifically valid a test needs several key ingredients. As a rule, SEO tests tend to be complete and total wastes of time and effort because they don’t bring together all the necessary factors.
Hypothesis - You test hypotheses, not ideas. An idea might be that links have an impact on search engine crawling, but a hypothesis is more specific. You propose a hypothesis as a means of conducting a test. You conduct the test so that you can observe the results and gather data. The hypothesis can be posed in the form of a statement (”The first outbound link on a page carries more weight than others”) or a question (”do all links on a page pass value?”).
Conditions - You test under conditions that have to be defined clearly. That means you cannot take the results of your test and use them to analyze phenomena that occur under different conditions. It means you MUST define your conditions to match natural search as closely as possible if you want to credibly extrapolate from your results for your own SEO theory.
Control - You need to debunk your own presupposed nonsense, and this is something SEO testers consistently fail to do. They don’t set up a control test to compare the results of their hypothetical test. Search results are not laboratories. Neither is the African jungle. But just as you can observe gorillas in the African jungle, so you can observe results on the Web. When you test gorillas at the Yerkes Primate Institute in Atlanta, you can compare their behavior to the behavior of gorillas in the wild. When you set up an SEO test, you can compare its results to documented search results behavior (there’s the rub, of course — no one has actually attempted to validly document search results behavior).
We can apply the scientific method to search engine optimization theory just like any other area of study. The rules for the scientific method are very simple and straight-forward:
1. Observation and description of a phenomenon or group of phenomena.2. Formulation of an hypothesis to explain the phenomena. In physics, the hypothesis often takes the form of a causal mechanism or a mathematical relation.
3. Use of the hypothesis to predict the existence of other phenomena, or to predict quantitatively the results of new observations.
4. Performance of experimental tests of the predictions by several independent experimenters and properly performed experiments.
The search engine optimization community does share a lot of observations, but they are usually limited observations (or they are useless observations, such as when all the SEO news sites light up with announcements about the latest Google Toolbar PageRank update). However, random observations do not constitute a body of data, certainly not data that can be analyzed. SEO reports are rarely detailed enough to be credible much less usable in any sort of scientific study.
Before we can describe phenomena we have to be able to observe them, either directly or indirectly. Indirect observation means that people reporting phenomena need to provide enough information that other people can see what is going on. In most SEO forums and blogs that just doesn’t happen. At best you get very glib, terse descriptions of patterns people think they have discerned. Even when I go into SEO forums and solicit URLs to study in the wake of apparent search engine updates, I do so on the condition of anonymity and my aggregate findings don’t provide enough information for people to check my conclusions.
That is, my SEO reporting is no more credible or valid than anyone else’s, and I have a pretty good idea of how to do it right. So do other people in the industry (although they are not your favorite SEO bloggers and forum operators — the real technicians rarely say anything and most of them don’t seem to be selling their services).
To be scientifically valid, observations have to be documented. We need to see other people’s data regardless of whether it’s anecdotal or not. We need to be able to look at what other people see in order to validate their observations. If someone goes into an SEO forum and says, “Hey, the real estate industry is churning on Google”, it’s not enough if four other people concur. We need to see before-and-after screen shots from multiple queries.
People just don’t do that. So if you want to do your own SEO testing you first need to start making a lot of observations and record them. If you want your SEO testing to be credible you need to share your observations with people. At the very least, you need to be able to present all your observations and data when it comes time to put up or shut up.
If you track a certain query every day, you should make a screen capture once a week and compare the results over time to see what the emerging trends and patterns are. There are many very subtle changes that occur in search results pages as the months roll by. They speak volumes about what people are doing with their content and what the search engines are dong.
Search optimizers usually draw conclusions rather than form hypotheses. You can see this in forums and blogs every day where they simply “share their opinions”. The hypothesizing crowd usually says nothing. After all, if you’re just forming a hypothesis you really don’t have anything to share or report.
A good example of an SEO hypothesis would be, “I think links that appear higher up on the page have greater impact on search results than links lower down the page.” You may even be able to think of couple of reports of such tests over the past year. Regrettably, these tests are no more valid or useful than the conclusion that the Yankees will win the 2010 World Series because they made it to the fnials in the 20th century.
I was actually going to write a lengthy post explaining why those tests were invalid, but then I decided it would be better to focus on how to construct a valid test.
So if we want to use the hypothesis that “links placed higher up on the page count more than links placed lower down on the page”, we have to make some predictions. That is, we have to play what if. What if SEO is dangerous because it cannnot encapsulate every possible outcome. You cannot count on your predictions being accurate unless they are backed up by a lot of data, a lot of observation.
Most SEO tests simply try to demonstrate a point. What a valid SEO test needs to do is attempt to disprove a hypothesis. That is, you’ve looked at a lot of search results and you have concluded that maybe links placed higher on a page count more than links placed lower on a page.
So you can put together Web pages that link to other Web pages with unique anchor text in various ways but you may only end up reinforcing your ideas. You don’t actually test your hypothesis because you don’t control the search indexing process. The fundamental flaw in both Russ Jones’ and Rand Fishkin’s tests is that neither test accounts for the lag times that inevitably occur in the search indexing process. Rand said he took his test through several iterations but the scope of his described test platform was completely inadequate.
In fact, the easiest way to prove a totally wrong hypothesis wrong is to find observational data that contradicts it. That is, before you start testing your explanation of how things work, can you find example observations that condtradict your explanation? If you can the appropriate thing to do is to come up with a completely new hypothesis, not to ignore the contradictions or say they don’t count.
I have, in fact, looked at which links seem to pass value on a page for Google, Ask, Yahoo!, and Live. Because of recent algorithmic changes my observations are no longer valid for either Yahoo! or Live. But for Google and Ask I have found many pages where only some of the links on a page appear to pass value — and this is after the pages have been fetched and reindexed several times.
One would hope that after Google has fetched a page 20 times it would have found all 30 links on the page. And yet maybe only 15 of the links seem to pass value. In fact, sometimes only 1 link seems to pass value.
Why is that?
It’s not because the 1st link on the page is given greater weight. The 1st link is often ineffective and other links further down the page seem to work better.
Now, that’s just an observation, not a hypothesis. It doesn’t attempt to explain anything and it won’t matter if anyone else steps up and says, “Yes, I agree with Michael”. I won’t be conducting any tests to determine whether the first link counts more than others because I have observed many pages where the first link passes no value while other links do pass value.
The fact that some people can set up artificial Web networks where the first link always seems to boost more than other links doesn’t change the fact that I have observed first links being passed over for other links. The tests are invalid because observational evidence contradicts them — even though I have not offered any documentation to substantiate what I say. You should be able to make similar observations yourself.
So in order to construct a valid SEO test you need a hypothesis that withstands the contradictions of observational evidence. After all, the whole purpose of the hypothesis is to explain what is being observed. If you attempt to explain only part of what is being observed your test still has to somehow take other factors into consideration (if only to provide the means for you to statistically filter them out).
But let’s say you develop a hypothesis that is not intuitively challenged by observation. It’s entirely possible (and in our industry most likely probable) that there are easily observable phenomena that will contradict your hypothesis. Nonetheless, if you at least show you made a good faith effort to find contradictory observartions then your hypothesis is stronger. But even the least contradictable hypothesis can be wrong. If you predict that Web page A should rise in the search results because Web pages B, C, and E link to it first, and Web page A does not gain position, you need to figure out why.
Underpopulated search queries are not white lab rats. If you’re going to test a change on a Web site, or test the power of a link, you have to throw it out into the search engine soup in order to see what happens. Web pages are not chemicals you mix up in test tubes. Think of them more as balloons you send out into the world environment. If you release a balloon inside a warehouse you’ll see very different results from what you would see by releasing the balloon outside.
Some people might argue that we first need to see if a balloon floats, but we already know that the balloon floats if we put the right substance in it. We learned that by playing around with substances and learning what their properties. We also know what the basic properties of links and content are. Now we want to know if our balloon is going to rise to 10,000 feet or 5,000 feet. We want to know if it changes shape. We want to know if it will be pushed more by east-west winds or by north-south winds.
That is, we want our first tests to help us make more observations. We’re not ready to draw valid conclusions and prove them. We first need to collect data by just creating Web pages and seeing what happens when they are released into the wild. It’s not enough to make observations about what happens to randomly selected pages. We want to observe what happens to pages that try different things — we want to document behaviors.
When we have collected a lot of observations we are ready to try to explain the behaviors we have seen. The explanation cannot be as simple as “links can influence search results”. That’s axiomatic because the search engines tell us that links influence search results. The explanation has to be more on the level of “links from page A seem to help more than links from page B”.
That hypothesis can be tested a few times, and if each test seems to confirm the hypothesis we can step back and ask, “What is it about page A that makes it more effective than page B?”
We need to observe more. We need to collect more information. Eventually, we need to find or construct another page that seems to share attributes with page A that make Page A more effective than page B. Our observations about page C will help us shape our hypothesis and eventually define the test for it.
If we get to a point where we can construct pages C, D, and E that have the same impact as page A, we should have a pretty solid hypothesis. It appears to reliably explain the Page A Effect.
Reporting the test result is not easy. You need to first catalogue the observations that led to your construction of the hypothesis. You need to explain why you chose the particular hypothesis you selected. You need to deal with as many of the “ifs”, “ands”, “buts”, and “ors” that people will dredge up (and in most cases people will be so overwhelmed by your data that they’ll bring up objections you dealt with — be patient, cut them some slack).
Then you need to describe your hypothesis. Your description has to make it clear that you are attempting to explain why something in particular happens. You’re not trying to prove that the first link counts more than others. You’re trying to show why first links seem to count more, or you’re trying to show that a page’s links don’t all pass value at the same time (if ever). You’re trying to explain why things are the way they are.
Finally you have to explain how your test works and why it shows your hypothesis explains the phenomena you have observed. Simply recreating the phenomena you have observed without the explanation is not a valid test.
The test does not have to prove the universal search algorithm theory. All it has to do is show that if you execute action A you should expect to see result B. So we’ve got two people who have shown that if you create Web sites for terms no one else uses and if you link to those sites in very specific orders you’ll probably see a reflection of those linking patterns in the very small search results.
What we don’t have are two tests that show that the first link counts more than other links on a page. A more valid test would be to go out and get a lot of pages that are already indexed to link to destinations with very specific anchor text. Then, based on those natural search results, we should be able to see if any patterns emerge. If you cannot replicate the effect across a multitude of pages, your hypothesis is invalid.
The hypothesis has to predict specific behaviors. To be useful in search engine optimization the hypothesis has to be relevant to natural search results. To test a hypothesis about natural search results, you must use natural search results. What happens in artificial query spaces tends to be very different from what happens in natural query spaces because there are many more factors at work in natural query spaces.
You have to isolate behaviors, not Web pages. You have to observe how things happen. You have to explain what you see in such a way that you can propose a test to prove that your explanation is wrong. If you fail to prove your explanation is wrong you have taken the first step toward proving that your explanation is right.
On the Web, you can create a valid test by using two groups of similar Web sites where a change is enacted in one group but not in the other. You make observations before and after the change is enacted. You compare the results for both groups to each other. You look for statistical variations.
You can construct simpler tests but their results will be less meaningful, less relevant to natural search. In the final analysis, whatever is most relevant to natural search will be most helpful to your search engine optimization.
3 Comments on How to construct a valid SEO test
By rjonesx on October 15, 2007 at 12:09 pm
First off, fantastic post. I agree that my experiment/study should not be the final say on this or any issue for many reasons, one of which that it was neither recreated nor retested numerous times. Additionally, I certainly did not address this experiment with the level of scientific rigor that I would were I still back in school writing my thesis.
I would like to note, however, that the observational basis with which you debunk my argument, is flawed. What you have observed indicates that order is not the only thing that matters in whether a link passes on value, not that it is irrelevant. My experimentation greatly controls the scenario…
1. Only 1 outbound linking site.
2. Subdomains on the same site receive the inbound links.
3. Subdomains never indexed prior to the creation of these inbound links.
4. Identical anchor text used.
5. Identical linking pattern used on all pages, to ensure that one test subject did not receive links from a greater PR page than another test subject.
6. Different class-c ips on different hosts with different whois information and different registrars.
7. Domains never interlinked before.
Given these controls, and many others that were put in place, I can say that in this experimental vacuum, order of links matters.
However, this does not make my statement generalizable (the results may not hold true once any single additional factor is introduced).
The links that you observed not passing on weight - were they between sites on the same host? ip? registrar? whois information? google analytics account? adwords account? webmaster tools account? Were they ever interlinked before? still interlinked on different pages? Did a link-back already exist?
The myriad factors which can taint your observations makes it a weak argument against the outcome of both Rand and my experiments.
I also must take strong issue with your arguments for a macro approach to observing and testing the web, that “simpler tests…will be less meaningful”. On the contrary, I cannot draw a single conclusion from tests that use existing sites which constantly see shifts in traffic, linking and content. Without knowledge of when Google applies the new data, you cannot draw accurate, specific observations.
By Michael Martinez on October 15, 2007 at 3:02 pm
Welcome to SEO Theory, Russ.
This was not an easy post to write because the subject deserves much more attention than I’m able to devote to it.
However, I have pointed out in the past that testing concepts in isolation defeats the purpose of determining what actually influences search results. When you deprive a search engine of the vast majority of the data it normally processes in order to return a search result, you render the choices it makes too artificial to be of use for general search engine optimization.
Your test shows that links influence search results. They don’t show that first links have a greater impact than lower links. You yourself point out that “in this experimental vacuum, order of links matters.”
I agree (as I did in my post). In the experimental vacuum your conditions created, order of links matters. But we cannot take that principle over to search engine optimization, where even on our own Web sites the first link is almost always given to a site’s root URL.
Who is likely to ever get a first link? The most prominently featured links on my own Web sites are not first links.
The links I count on to kickstart my SEO process are not first links. The links that many people use to boost their relevance are not first links.
So in the final analysis your test doesn’t teach us anything about how we can leverage what we are doing to optimize for search.
To make your test relevant to the natural search results, you’ll have to devise new conditions so that we can observe the phenoema in natural search.
Russ: The links that you observed not passing on weight - were they between sites on the same host? ip? registrar? whois information? google analytics account? adwords account? webmaster tools account? Were they ever interlinked before? still interlinked on different pages? Did a link-back already exist?
Michael: Literally every condition you ask about (and many others) is covered in the conditions under which I have observed non-performing links. It doesn’t matter if the links are pointing to pages on the same domain, same host, or some site/host on the other side of the world. It doesn’t matter if there is an adwords account or not. It doesn’t matter if the sites have been validated or not through Webmaster tools.
Just because a link is placed on a page doesn’t mean it will pass value, even though other links on the page do pass value. I have a hypothesis that may explain why that doesn’t happen (several, actually) but I’m still collecting data.
I can tell you that a link in the middle of a list of links, or a link in a paragaph peppered with links (none of them sold or being used for commercial purposes) seems as likely to help a destination page be crawled or accrue anchor text as not. I cannot tell you why that is so.
By Chas on February 17, 2008 at 7:00 am
How to test SEO.
-Look at the Search Engine Results.
-Draw conclusions based on what you find.
-Build up enough data to support a correlation of positive, negative, or neutral.
Comment
Log in or Register to post a comment.