Why SEO tests almost always fail
Posted by admin on January 24, 2007 in SEO Theory
I’ve been preaching experiment, evaluate, and adjust for a long time but I’ve also been criticizing many SEO tests that people have proposed and executed through the years. I’m often asked to show why I say tests are invalid, or to show what would be a better way to perform these tests.
A successful test should challenge an assertion or hypothesis. You can propose the point you want to challenge, or you can challenge a point someone else makes. We don’t formally stipulate hypotheses in the search engine optimization community. Most tests result from speculative or professional curiosity.
For example, some people have questioned what actually happens with the rel=’nofollow’ link attribute. On January 18, 2005 Google announced it that “from now on, when Google sees the attribute (rel=’nofollow’) on hyperlinks, those links won’t get any credit when we rank websites in our search results”.
They did not say they would not crawl the links. Furthermore, they also noted that “we think any piece of software that allows others to add links to an author’s site (including guestbooks, visitor stats, or referrer lists) can use this attribute. We’re working primarily with blog software makers for now because blogs are such a common target.”
So the philosophy of crawling nofollowed links and the philosophy of advocating their uses outside of blogs go all the way back to Google’s original announcement.
Nonetheless, people in the search engine optimization community have been roasting Google for not adhering to the “no crawl” principle and for extending “rel=’nofollow’” beyond use in blogs.
Personally, I have always maintained that this attribute in no way discourages link spam (and numerous voices have been raised through the past two years agreeing with me). Furthermore, the attribute does not address the core problem, which is that Google’s system is flawed because it has always allowed links to pass unmerited value and continues to do so.
Remember, there are still thousands of trusted links today that tell the search engines that several Web sites which don’t use the words are nonetheless highly relevant to “miserable failure”. The system encourages abuse because it enables the abuse without accepting responsibility for the problem it created.
Google never told anyone that links are the way to establish relevance in its search results. In fact, Google has consistently maintained through the years that relevance is determined by a large number of factors. But Google’s creators wrongly and naively stated from the beginning that “link structure … and link text provide a lot of information for making relevance judgments and quality filtering”. That nonsense was never true and it’s certainly not likely to ever be true.
The mere fact that Google took link anchor text into consideration for relevance ultimately led to the discovery that you could overweight a page through off-page factors. We SEOs first exploited the gaping flaw in Google’s algorithm with link farms and hallway networks we had created to push pages into Inktomi’s main index. We actually didn’t even pay much attention to Google’s algorithm because hardly anyone was using Google back then.
But then bloggers discovered they could push each others’ pages to the top of Google’s search results through link bombing. As SEOs “tested” this capability, they reached the wrong conclusion that this was how Google had always ranked results. On the basis of that faulty conclusion, SEOs conducted more tests to show that Google did in fact rank search results on the basis of link anchor text.
Of course, you could easily achieve the same effect (and many spammers did) just by repeating your keywords on the page 100 times and hiding them. As recently as 2006 that trick was still working for some spammers. It may still be working today, right up until Google’s review process detects the artificial repetition.
Assuming that a test will confirm a hypothesis is the first and most common flaw in today’s search engine analysis. Your tests should be constructed to prove your assumptions wrong. That is, a successful test will show that your conjecture is correct if the test pursues a reasonable course toward failure.
To prove that Google ranks by links, you have to do one of two things: show that all query results are ranked that way or show there are no other query results for any other ranking methodology.
The easier test is to look for queries that produce results which are not ranked by link anchor text. The test is almost trivial (it involves the use of one query operator) and it confirms that the assumption that Google determines rankings on the basis of links is completely false.
And yet we have thousands of Google queries where people are obviously achieving rankings solely on the basis of links. If Google does not rank by links, people reasonably ask, then why do we see such results?
And here is where the second most common flaw occurs: instead of looking for an answer to the reasonable question, people wrongly assume that the question proves the false hypothesis.
The reason why you can rank through links is that any link which passes value is associating its anchor text (and oftentimes text outside but close to the anchor) to the page being linked to. Google simply treats the page as if it actually includes the text being passed to it through the links.
After that one minor adjustment in its database, Google simply applies its old relevance determination formula and voila! repetition of keywords wins out once again. The only difference between pointing 10,000 links with a keyword at a page and actually including the keyword on the page 10,000 times is that it’s easiser to algorithmically detect the manipulation where the keywords are really on the page.
In other words, valid tests do not show that an assumption is correct. Rather, valid tests only try to show that an assertion is false. If the assertion is true, a valid test may or may not prove the truth of the assertion. If an assertion is false, a valid test will show that the assertion is false.
In fact, tests do not constitute proofs. A test which correctly demonstrates a principle can only be devised once the principle is understood well enough to be proven. You prove your assertion through logic, not through tests.
For example, I’ve given the logical argument above that link anchor text weights search results because Google treats the anchor text as if it is really part of the page test. You can run a quick test by pointing a link with the unique anchor text of “pink elephants have no fun in the upstairs yard” at a page. When Google processes the anchor text and assigns it, both the source and the destination page will appear in your search results.
Now search for the unique title words of the source page. Only the source page appears in the search results. Why doesn’t the destination page show up? After all, we have a link to the destination page. If the link is what makes the difference in the search results, then the destination page should be relevant for every word on the source page.
So why did such a simple and easily identified misunderstanding become such a pervasive myth throughout the SEO community? One possible explanation (and I think it’s the most likely explanation) is that SEOs embarked upon a link feeding frenzy simply because it was so easy and simple to get links for years.
Every time you turn around, some naive genius is creating a new Web resources that empowers self-linking: guest-books, free-for-all-link pages, automated reciprocal linking tools, forum software, blog software, automated directory tools, Wiki-based tools, social linking and tagging tools, bookmarking tools, free article directories, press release distribution services, classified advertising sites, etc.
Worse, people have also produced a ton of “human-edited” resources solely for the sake of being the largest collections of links in specific niches. These resources are sometimes automated, supported by crawling, and oftentimes provide secondary information. Think of Alexa, exploitable search tools that let you embed HTML code, archive services, and other “open use” resources that serve as information traps for Web sites.
All of these types of resources existed before Inktomi and Google were big and popular. People have always had incentives to inflate links, create false and fraudulent links, and more. The search engines only provided one more reason to engage in link infatuation.
The ease with which people who don’t know much about search engine optimization can engorge themselves with link-enriched relevance has not only spoiled most of today’s SEOs, it has created and fed a cyclical addiction to the point where, when Wikipedia once again implemented the “rel=’nofollow’” attribute on all outbound links, many SEOs complained that they had just lost value from hundreds of links embedded in Wikipedia.
Wikipedia’s value as an information resource has always been specious at best. But the open, blatant, and massive manipulation of Wikipedia articles for marketing links (that not only influence search engines but also drive traffic) has corrupted it into the largest free-for-all page in history. Adding “rel=’nofollow’” not only fails to address the problem, it virtually ensures the problem will continue because the Wikipedia editors (and Jimmy Wales in particular) just don’t get it.
When your marketing strategy is built upon links, how long will it be before you notice referrals in your server logs from those links? SEOs don’t need to run many tests to figure out that even the spammiest of links can send traffic.
Hence, the ease with which obvious conclusions are reached by a majority of people reinforce the misconceptions that are associated with those obvious conclusions. It’s easy to get links, and as long as it’s easy to get links, the links will create visibility. That visibility will continue to build value in the Web sites to which the links point even long after the links stop passing value through the search engines.
But now the damage is done and SEOs who depend on links for search engine manipulation don’t believe — because they have lied to themselves for years — that there is any other way to manipulate search results. In fact, there has always been at least one other way, and it’s actually a more efficient, less time-consuming methodology that works for most competitive queries.
It’s the lack of valid testing that has brought the SEO community to the point of completely ignoring the truth about relevance in favor of supporting its spurious mythology. When SEOs go back to challenging their own assumptions, the quality of SEO tests and the conclusions based upon them should improve.
Until the next time the community becomes addicted to a cheap and easy way of establishing relevance.
Comment
Log in or Register to post a comment.