Looking for science in search engine optimization

Posted by admin on January 22, 2007 in SEO Theory

Did you know that the sum of any series of consecutive odd integers starting with 1 is equal to the square of the number of consecutive integers in the series? Some examples include: 1 + 3 = 4 (the square of 2), 1 + 3 + 5 = 9 (the square of 3), 1 + 3 + 5 + 7 = 16 (the square of 4).

You can also take the third power of any positive integer and use it to multiply the third power of any other positive integer to create another third power of a positive integer. That is, X-times-X-times-X times Y-times-Y-times-Y equals Z-times-Z-times-Z. Or, the product of two cubed integers is always another cubed integer.

These apparently useless bits of number theory have been rattling around the human library of knowledge for thousands of years ever since Pythagoras established his school in Croton, southern Italy, which eventually became the Order of the Pythagoreans. One would be hard pressed to show that Pythagoras was much of a scientist, however.

He may have been more of a musician, and he even pictured the universe as a series of spheres embedded within each other, producing musical sounds unique to each sphere. He was certainly a philosopher and an early mathematician. But he didn’t consciously use what we have come to call the scientfic method.

I prefer this definition for the scientific method: The scientific method follows a series of steps: (1) identify a problem you would like to solve, (2) formulate a hypothesis, (3) test the hypothesis, (4) collect and analyze the data, (5) make conclusions.

Fairly simple, right? Nonetheless, this simple process forms the basis for every scientific discipline we have developed, from Genetics to Cosmic String Theory. Or, from Astronomy to Zoology, if you want to cover the whole western alphabet.

The recent debate over whether search engine optimization is equivalent to rocket science is a bit silly. It’s not a science because it lacks any set of formal declarations. There are no proofs of laws of behavior to define the discipline. But is that really what is required of science?

Hm. Perhaps it’s just that we have no formalized systematic process for identifying and solving problems. We cannot show that our solutions are “correct”, and therefore are left only with a collection of randomly organized questions and “working solutions”, none of which really identify the actual problems and solutions.

If it were up to me (and it’s not) to lay down the first laws of search engine optimization, I would say they would not be of much practical use to the average SEO. That is, to take a set of laws and principles and produce a practical application requires a fair amount of work. Applied science is highly derivative in nature and little resembles theoretical science, particularly since applied science really depends a great deal upon engineering.

Have you ever heard of anyone discussing search results engineering? That is really what search engine optimization is all about. We engineer solutions to practical, every day problems. But if we’re fumbling around in the dark, we’re not approaching those solutions scientifically. We’re applying the unproven hypotheses of an undefined (but maybe definable) science.

My laws of search engine optimization would look something like this:

Law of the Integrity of Information Indexing Systems: No information indexing system is immune to manipulation. Corollary: Information indexing systems are only as precise as the orderliness of their data.

Search engine optimization, or more properly search results engineering, succeeds by disrupting the orderliness of the data indexed by search engines. The integrity of the search databases is thus compromised as soon as any disorderly data enters their indexes.

But what is orderly data?

Definition: In an information indexing system, data is orderly or well-behaved if and only if it complies with all restrictions requiring that the data be orderly or well-ordered. Don’t you just love scientific jargon? Of course, a real scientist (whatever that is) might write the definition differently, but the point is that you have to place some restrictions on data in order to achieve orderliness.

Take Google’s dual indexing scheme, for example. Google tells us that they place fewer restraints on sites that we crawl for this supplemental index than they place on sites for the main index. In other words, the data in Google’s Supplemental Results Index is chaotic, ill-behaved. It is disorderly data. I would prefer to call it unordered data but that could (and perhaps should) actually mean something else.

But if there is order to the data that is accepted into Google’s ideal Main Index, what is the basis for that order? I think most people would quickly say “trust” but “trust” is just the layman’s term for all the precise criteria Google applies to determine that data is ordered or well-behaved. In other words, there is more to it than “trust”. It’s a collection of “restraints” that are used to filter or vet the data that Google admits to the Main Index.

Until the Thanksgiving 2006 Google Update, I would have said that Google was doing a very poor job of ensuring order in their Main Index. Now I’m a little more impressed with their efforts. They probably now have about as much integrity as Ask, although Ask has intentionally disrupted the order of its own data by conferring special status upon Wikipedia articles. That’s bad for Ask, good for Google because Google doesn’t really do that. I know many people believe they do, but that belief is characteristic of the lack of science that undermines today’s search engine optimization methodologies.

Another law I would propose for SEO would look like:

Law of Transitive Value of Data: Indexed data is transitive if and only if it possesses value in two or more vectors. And what, you may well ask, is a “vector”? I’m glad you did ask. A vector is something like a topic space, although you could have multiple vectors for any specific topic.

Let’s say there are 500 Web sites over there which all talk about dog breeding. Somewhere in the midst of those 500 sites is a most highly valued site. Most of the other 499 sites link to that most highly valued site, but when you filter out all the other outbound links, all you end up with are the 500 Web sites.

But wait. Over here we have 300 dog breeding sites in another vector. Assuming these two vectors are completely distinct from each other, there is no transitive data between them. But suppose both vectors point to the same most highly valued site (and furthermore suppose that site points to no sites in either vector). The most highly valued site is transitive because it is independent of either vector. It can be found in two or more vectors.

Now, a real scientist might not call our most highly valued site transitive. Maybe they would call it a multi-vector rational point or a cross-vector intersection or something equally scientific. The point is that we actually do have groups of Web sites that are devoted to similar topics but which literally have no connection with each other — except that (usually) they link to some very well known site.

In theory, you could have multiple vectors that include no transitive pages. And you could say all the vectors in a topic space constitute a class of vectors or a super-vector.

Not that any of this helps you figure out how to achieve high rankings for any given keyword across Ask, Google, Live Search, and Yahoo!. Quite the contrary, all this pseudo-scientific babble really just serves as a distraction because when you get down to brass tacks search engine optimization is not concerned with organizing itself. Science is really concerned with organizing information. Search engine optimization is really concerned with placing favored information in specific search results.

The science behind SEO — if there is ever to be any science — will have to focus on the analytical aspects of optimization. Algorithm chasing. And most people are not algorithm chasers. And most algorithm chasers are really not very good at it. Oh, we all think we are pretty good, but truth be told even I can look back at things I wrote two years ago and cringe. Maybe within the context of the search technology we had to work with then many of the things I said made sense, but I habitually give my own names to concepts and principles that other people have already named.

I’m just like every other SEO in that respect. We have no body of authoritative researchers who can establish the names for us (and there is no way I would agree to let any of the so-called professional organizations that have been formed speak for me).

Does that mean I’m waiting for the academic community to write the laws and definitions for us? Not really. They haven’t much impressed me as being very knowledgable about search engine optimization, and I’ve read dozens, perhaps well over a hundred, technical papers written by academics and search engine employees.

They understand the science and technology behind the search engines far better than I do. But when it comes to identifying the problems addressed by search engine optimization and proposing solutions for those problems, the technical paper authors have achieved about as much as a monkey with a typewriter would. They don’t understand what search engine optimization is really all about — or, at least, they don’t write about search engine optimization. They write about worst-case abuse scenarios, which admittedly have always been around.

Maybe I’m just reading the wrong technical papers. I would laugh at any academic who insisted I was because, frankly, if they really had a clue you’d think it would have crept into at least a few of the papers that I have read. Still, maybe there is a secretive community of search engineers who really do understand SEO well enough to define it, assign laws and axioms to it, and determine just what really works and what doesn’t.

I would love to see such literature. I sure don’t expect to see it come from the search engine optimization community.

At a high level search engine optimization is really only concerned with injecting disorderly data into a well-ordered set of data. Just because data is disorderly doesn’t mean it’s bad data, spam, or worthless (or worth less). Data is disorderly only because it hasn’t met the precise criteria for being deemed orderly. There is freedom in being disorderly. There is considerable obligation in being well-ordered.

So when you get your data into the well-ordered results, it becomes incumbent upon you to ensure that your data behaves well. That’s like saying to a Democrat, “It’s okay to bash the Republicans on your way to Washington, but once you get there you have to join the Republican Party.”

And maybe that is why the art and profession of search engine optimization (or should I say search results engineering?) is not scientific. Maybe it’s all just too political to be a real science.

So, if you want to defend SEO as being complicated and beyond the reach of the average person, fine. Do that. Or, if you want to say that anyone can do it, fine. Say as much.

Just remember that if you’re going to optimize Web sites for search engines, you have to do three things: experiment, evaluate, adjust.

I don’t care what you or anyone else calls it. That is what I call the Search Engine Optimization Method.

Comment

Log in or Register to post a comment.

More

Read more posts by admin

New SEO definitions to ponder… SEO by the book