The Theorem of Four SEO Influences
Posted by Michael Martinez on January 1, 2008 in SEO Theory
Pierre de Fermat once wrote in the margin of a book on Diophantine equations that it is impossible to raise two integers to a power greater than two, such that their sum is a third integer of the same or like power. That is, there is no integer-based solution to the equation X to nth power + Y to the nth power equals Z to the nth power (where n is greater than 2).
Mathematicians have struggled with Fermat’s assertion for several hundred years, although Andrew Wiles used advanced mathematical concepts to close out a theoretical proof for Fermat that had been building since the mid-20th century through the work of several obscure but very advanced mathematicians.
Fermat’s assertion is now known to be true, but whether he had discovered what some people call a “Euclidian proof” for his assertion remains a mystery. He may only have made a lucky intuitive guess that turned out to be right. He may have figured out the most elegant proof in mathematical history, only to have lost it because the margin of his book did not contain enough space for him to write it down.
A good theorem tends to be short and to the point. It’s an assertion that reeks with clarity. For example, we could assert that there is a correlation between the appearance of a full moon and an increase in bizarre behavior among people. Although that’s an assertion, it’s not a theorem.
The theorem usually has to be proven true (otherwise, you should call it a proposed theorem — so Fermat’s Last Theorem was really a proposed theorem, an unproven assertion).
You can prove various theorems in search engine optimization if you understand the basic principles of making assertions and proving them. For example, there is the Theorem of Four SEO Influences. That is, there are four reasons for why your search engine rankings change:
- You do something with your site
- Someone else does something with their site
- The search engines do something with their data
- People search for something different
Technically, you could argue that the fourth factor really doesn’t apply. Or you could argue that the theorem would be more correctly stated, “There are four reasons why your search engine referrals change”. I call this the Theorem of Four SEO Influences because these are the four factors that influence not only your search referrals but also your optimization, your rankings, and your SEO strategies.
An entire sub-industry has been built on the basis of the Theorem of Four SEO Influences — the sub-industry of comparative analytics. Comparative analytics looks at every available indicator concerning your site and your competitors’ sites.
But how does the Theorem of Four SEO Influences work? What is the proof of its correctness? Let’s take a look at what it means.
A search engine sorts data (information about Web pages) and uses that sorted data to answer questions. All queries regardless of how they are worded can be reorganized into a single, simple question of the form “Which documents (in your database of documents) are most relevant to the terms X, Y, and Z?”
Now, let me note that search engines can and do alter their relevance algorithms to promote less relevant information for a variety of reasons. They don’t have to be engaging in any calamitous conspiracies in the process. For example, suppose you phrase a serious query about sex — not a query about pornography. Would you really want to see 100,000 porn sites come up in your search results? Of course not.
So when it comes to determining relevance, search engines have to make some judgement calls. I criticize Google for favoring pages in its Main Web Index over pages in its Supplemental Results Index because I feel they have made a terribly bad judgement call, not because I think their algorithm is broken. Google has extended its working definition of what we call relevance to embrace a more complex concept.
For the sake of this discussion, let’s assume that everyone agrees that “relevance” refers to the documents that actually have the most closely matched terms and expressions (with respect to the user query) associated with them (the documents). And let’s assume for the sake of discussion that the search engines are not dividing the Web into philosophical equivalents of “haves” and “have-nots”.
In an ideal world, there would be a Universal Theory of Relevance that should say something like, “All documents pertaining to a specific topic possess characteristics which are unique to that topic and which, when examined systematically, can be measured such that each document’s relevance to the topic can be scored and compared to other relevant documents’ scores.”
In essence, that is what every search engine seeks to accomplish. There are probably a million different ways to measure and score relevance. The whole field of identifying and measuring relevance is in its infancy. Nonetheless, there are some fairly standard mechanisms that search engines can (and do) employ to measure relevance.
You can boil those mechanisms down to one thing: text. You can qualify the text as “being found in the document”, “being found in other documents that refer to the document”, and “being found in information associated with the document”. Still, the search engines can only measure relevance on the basis of text.
Text occurs in many places: title tags, page URLs, link anchor text, image alt text, comments, meta tags, and in the formatted (or unformatted) visible text areas and HTML controls of a page.
Each search engine decides for itself which places it will look for text associated with a document.
The first influence on your site’s search engine rankings (or traffic) is what you do with your own site. Where do you place your text, what do you do to your text, how much text do you use? These questions influence your rankings and referrals because they influence the search engines’ knowledge about your Web documents. You can assert relevance by using specific text more frequently, with more emphasis, and in more places.
To change your apparent relevance, all you have to do is add or remove text from your page. Apparent relevance is the humanly intuitive or obvious (unmeasured) relevance a document possesses to any topic. When a search engine sees the change in your text (which may or may not be contained in your document — that is, if you change your linkage, directory descriptions, or other off-page factors, you are still changing your own document) it rescores or re-evaluates your document.
Hence, we can say that any change in the composition of a document’s text alters its computable relevance to one or more topics. Computable relevance is a potential value that can be measured. Computed relevance is derived from a search engine’s measurement of factors related to a user’s query.
Just as you can alter your own text (either within your document or elsewhere), other people can alter their text. So other people can alter the composition of their documents or the information about their documents. Such changes indirectly influence your computable relevance because relevance is computed on the basis of what the search engine knows about all Web documents (with respect to the user’s query).
Hence, we can say that any change in any document’s composition or external information alters the computable relevance of all documents that are relevant to a specific topic derived from the document’s text and/or document’s information. In simpler language, any one document can influence the relevance of all other documents associated with a particular topic.
If search engine algorithms could not engage in diversity, all documents would be equal because of the first two SEO Influences. However, search engines can employ whatever algorithms seem useful to them. An algorithm collects, organizes, and measures data. Each of the processes can influence how the data is scored.
For example, suppose a search engine chooses to exclude some documents from indexing? Clearly, those documents will score 0 relevance for every user query.
Suppose a search engine chooses to reindex some documents more often than others? Clearly, the documents that are reindexed more often have greater potential computable relevance than the documents indexed less often (because computable relevance is based only on what the search engine knows about a document).
Suppose a search engine decides that one document is more valuable than another document? Suppose the search engine prefers to look at its most highly valued documents first when answering user queries? The less valued documents have fewer chances to be included in search results because in some queries more valued documents will provide sufficient results to obviate the need to look at less valued documents.
The fact a document is highly valued by a search engine in no way means that the document is more relevant to any particular query than a less valued document. It only means that the search engine favors one document over another. Valuations can be assessed in a multitude of ways, and or a variety of reasons.
Finally, a search engine may select a number of documents and evaluate them for relevance on the basis of information it possesses about each document regardless of how much value it places on the document. That is, relevance scoring is indepent of valuation since valuation in itself influences the selection of documents. Relevance can only be algorithmically determined by looking at the text a document possesses that is related to the user query.
Text can be scored for relevance in many different ways. We don’t need to know the specifics of any one search engine’s relevance algorithm to see that a document with more occurrences of a word is more relevant to that word than a document with fewer occurences of that word. However, a relevance algorithm can be constructed such that usage affects relevance scoring.
For example, suppose a search engine knows that some words are used almost exclusively in link anchor text. Should documents that only possess those words in link anchor text be scored more relevant to a query for the words if there are documents that use the words in their non-anchor text? Link anchor text can appear both on and off the page (after all, for every page with an inbound link there is a page with an outbound link that is the inbound link of the first page).
In other words, which page is more relevant: the page being linked to or the page doing the linking? If the page doing the linking has more occurrences of the search term (outside of link anchor text) than the page being linked to, which page should be deemed more relevant?
Of course, a search engine does not have to provide the most relevant document to a user. It may seek to provide the most satisfying document according to undisclosed criteria. In effect, search engines can alter their definitions of relevance to include imputed expressions for every query (such as, “which documents are most trusted by other documents”, “which documents are most popular with other documents”, etc.).
Hence, if you ask a search engine, “Which documents are most relevant to ‘horses, dogs, and pigs’” the search engine may alter your query implicitly by adding (without your knowledge) qualifying expressions so that your query becomes “Which trusted documents are most relevant to ‘horses, dogs, and pigs’” or “Which frequently updated documents are most relevant to ‘horses, dogs, and pigs’”, or “Which high value documents are most relevant to ‘horses, dogs, and pigs’”.
Therefore, whatever a search engine chooses to do with its data (through the collection, organization, and measurement processes) directly influences your Web site’s rankings or referrals in search results.
Finally, a user may ask a search engine to return documents that are similar to your document or the user may ask for documents dissimilar to your own. The more similar the user’s interests are to your document topics, the more likely your documents will be shown to the user by the search engine (all other things being equal).
And that is what the Theorem of Four SEO Influences is all about. Until you master this fundamental principle of search engine optimization, everything else you do amounts to fumbling in the dark because you don’t understand how all the pieces work together.
The Theorem of Four SEO Influences lays out all the parts of the puzzle in pieces large enough for any SEO pundit to understand. We’ll come back to this theorem later and look at some of the implications it has for typical search engine optimization schemes that sometimes work and sometimes misfire.
Comment
Log in or Register to post a comment.