Google’s Web Apartheid: Gone Supplemental and Getting Nowhere

Posted by admin on February 15, 2007 in Supplemental Pages

Google Supplemental Pages have almost become the most popular topic on the SEO Web today. Naturally, many business site operators feel they have a lot to lose by going supplemental. A few months ago I was of the opinion that having pages in the Supplemental Index really didn’t matter. I saw plenty of Supplemenntal pages appearing in top ten results for commercial queries.

And then what I call the Google 2006 Thanksgiving Update occurred. Although it most likely technically began and ended before Thanskgiving, we did not collectively see the results of this update until the weekend before Thanksgiving. As far as I can determine, we continue to see new results today.

I think it’s clear now that Google has effectively separated the Web into two camps: the pages Google trusts and the pages Google is not sure it should trust. I wrote about this Web apartheid effect two years ago in On the Googleness of Being (ironically, the original document does not appear in the top ten search results despite numerous links to it from many sites).

Much of what I wrote in that paper was speculative and it really doesn’t apply today because regardless of how well or badly it may have fit Google in February 2005, Google has completely redesigned itself since then. There are some characteristics of the Google of February 2005 which appeared about a year earlier (in early 2004, when the so-called “Sandbox Effect” was first noticed) and some characteristics which were later clarified.

Today’s Google, however, still resembles in some ways the figurative Google I attempted to describe in “On the Googleness of Being”. Some of the aspects of the current Google that still seem to be relevant include:

  • Trusted Content Sites
  • (Page) Reputation
  • Child Inheritance
  • Outbound links tell Google something about a page’s intentions

I think we can better explain some of these concepts, too.

For example, most SEOs now acknowledge that Google has conferred a trusted status to (usually older) Web sites with many natural inbound links. Furthermore, Matt Cutts mentioned one aspect of their trust system in addressing general concerns about the remapping or migration of hundreds of thousands of URLs by MSN. He has also warned people that selling links could cause Google to lose trust in their pages.

In the post-Bigdaddy wrapup phase, Matt suggested to one person who had been caught supporting spam sites that Webmasters should think about the quality of your links if you’d prefer to have more pages crawled. He was referring to outbound links, where the specific site in question was linking to “a free ringtones site, an SEO contest, and an Omega 3 fish oil site”. Matt has pointed out that [participating in] SEO contests [and selling links] … affects your site’s ability to flow PageRank and the trust in your site with no indication of a penalty or loss of value in the Toolbar number so many people foolishly waste their time on.

When Matt Cutts talks about Reputation he refers to PageRank and anchortext (sic). When I speak of Reputation, I’m talking about whether your site (or page) has the ability to pass value. There may be three levels of reputation with Google: Untrusted, Accepted, and Trusted. I described these concepts in Looking At Links: Paint A Rainbow Across The Web, Strong linkage versus weak linkage, which I posted on HighRankings in April 2006. Let me recap the definitions I gave there (I should probably compile a comprehensive “Michael Martinez SEO Glossary some day):

Strong linkage - A variety of links from multiple domains with a broad selection of topics. Links are not based on relevance, relationship, or category. Most strong links are probably found on trusted sites or sites with large numbers of links from trusted sites.

Weak linkage - Includes links from the same domain, links from directories, and links from untrusted sites. Most reciprocal links would be weak links.

Link base - The page where a link comes from, including the domain (insofar as the domain may be Trusted, Accepted, or Untrusted).

Trusted domain - A domain that is considered to be trustworthy by the search engines. Trusted domains are probably used as the base for initiation new crawls of the Web.

Accepted domain - A domain that is considered to be acceptable, insofar as it has not triggered any filters. An accepted domain simply doesn’t have Trusted status.

Untrusted domain - A domain that has tripped one or more filters. It may be penalized or banned or it may only have tripped some triggers (individual pages may be penalized).

Although I speak of strong linkage, it does not follow that any given link would be a strong link. In evaluating strong linkage and weak linkage, I hold the view that “the whole is greater than the sum of the parts”.

That is, any given link could be part of page A’s strong linkage and part of page B’s weak linkage. The difference is not so much in the individual links themselves (although some links clearly don’t work) as in the community of links as they are mapped out.

If they all come from similar sources, then they may be weak. If they come from a variety of sources, they may be strong.

There is no definitive way to measure strong and weak linkage. It’s more of a “you know it when you see it” kind of thing. But I think that as more people start to look at links in new critical ways, some metrics will be developed that indicate strength of linkage.

Looking past the Strong/Weak linkage concept, one could say that Untrusted Domains don’t rank, Accepted Domains may rank but don’t pass value, and Trusted Domains rank and probably pass value.

That sums up what most people report in Google today. If your pages are in the Suppemental Index, they are probably Accepted but may not pass value. If your pages are in the Main Index, they are probably Trusted and may pass value.

I’m not convinced that I nailed the weak/strong linkage, however. For example, how do all the pages of an Accepted Domain get crawled if their links don’t pass value? That makes no sense, unless Google will place pages it finds through nofollow in the Supplemental Results Index (I have seen no real evidence that this happens). Instead, it seems to be that Google will trust internal links enough to crawl and index a site’s own pages. I would guess that Google is saying to itself, “Well, if these links share the same base or host (domain or sub-domain) they are probably internal and therefore are probably more trustworthy than links pointing to other hosts”.

If that is a better assessment of how Google handles links, then internal links are actually strong links because they help get sites crawled and indexed. That would explain why so few inbound links are really needed to get most sites crawled and indexed.

It would also explain the Child Inheritance effect that I and other SEOs have observed: when you add a well-optimized page to an existing, large content domain, that page stands a very good chance of ranking well in highly competitive queries. It will almost certainly shoot to the top of uncompetitive query results. As long as the parent domain links solidly to the child page (no sneaky links), Google seems to treat the new page as though it’s trustworthy enough to be judged on its own merits.

That may mean that external linkage is probably weaker than internal linkage. I’ve been saying as much for years but for entirely different reasons. It may only be that Google’s trust effect is more visible, and it makes sense that the more you trust your own content, the more that the trust other people’s pages place in your own pages will help your newer content.

But you can kill the goose that lays the golden egg by linking out to too many untrusted domains. If you’re playing in SEO contests and boosting free ringtones sites, Google seems to be saying, “Well, you’re not really trying to stand beside trusted domains”. So any trust you’ve gained can be quickly squandered.

And one fast way to squander trust may be to try to spread it to fast, too thin. I’m talking about site-wide links, which some people claim have worked for them and others claim have hurt them. One possible explanation for the inconsistent value of site-wide (also called “run-of-site”) links may be found in a paper published last year titled Site Level Noise Removal for Search Engines (.PDF file).

The researchers tested several algorithms and cutoff points or thresholds and they decided that allowing no more than 2% of a site’s inbound links to come from any one other site effectively reduced “spam noise” in search results. The methodology won’t work well with pages that have fewer than 50 inbound links but once your inbound links surpass 50 you could, hypothetically, see sudden, drastic changes in your link-driven search results.

If the links are sold (and how does a search engine know this?), then the linking site may be penalized to the extent that it is no longer allowed to pass value (although it could also lose its ability to rank, as well). So site-wide links are dangerous not only for recipients (who may waste time and effort in pursuing such links) but they may actually be much riskier for the link providers.

In essence, what we’re seeing is a separation of the wheat from the chaffe, the oil from the water. Google is gradually dividing the Web into clusters of sites it doesn’t like at all (penalized and/or banned from the index), clusters of sites it doesn’t quite know what to do with (they cannot pass value, or not much), and sites it feels have earned the right to be positioned highly in search results and to recommend or vouch for other sites.

That is, after all, what Google has been trying to elevate links to achieve since it was first designed, is it not? They want good, trusted links that vouch for other pages. But though I speak of “Web Apartheid”, it is possible for pages to move out of one group and into the other. Conceivably, even banned or penalized sites can be restored to Google’s good graces by discontinuing bad behavior and asking for reinclusion.

Where the Supplemental Index seems to be going, however, is away from the concept of penalizing pages for bad behavior and more toward the concept of rewarding pages for earning good recommendations from trusted sites. Supplemental status now appears to only mean that Google accepts your content but doesn’t fully trust it. Your pages have to earn more trust, perhaps by several paths. I would certainly hope so, as I cannot imagine a very large percentage of good content sites recovering their trusted status through inbound links. Most people don’t know how to get really good links, and even if they did how likely is it that they would be able to find such links?

So that leaves us with many more questions than I have really answered (have I answered any at all?). Can a page “age” itself out of the Supplemental Results Index? My gut feeling is that won’t happen, but perhaps some people will find a way to do this. Can a Supplemental Results page pass value to another Supplemental Results page? That is, if you get a lot of links from Supplemental Results pages, will that suffice to get you into the Main Index?

The problem for Webmasters right now, however, is that Google seems to be driving query results primarily from the Main Index. For example, Google Coop’s custom search engine tool does not show results from the Supplemental Results Index. And if you run a site: query, Main Index pages are expected to appear first. Even Webmaster Central’s new link reporting feature excludes links from the Supplemental Results Index.

As of this writing, neither the Custom Search Blog nor the Webmaster Central Blog has provided any indication of progress toward including Supplemental Results into their respective services.

And the problem rises above the frustrations of Webmasters because Google is ultimately denying its user community access to a great deal of useful, authoritative, relevant content. That is because Google’s basic premise — that links can democratically determine value and quality — remains false despite the impressive job they have done in filtering out untrustworthy sites. As Jill Whelan is so fond of saying, Google has now thrown a lot of babies out with the bathwater and there is no indication that Google intends or wants to bring those babies back into the fold.

The ultimate result, in my opinion, will be counter-productive. Desperation is the mother of spam, and I think that more Webmasters than ever before now have an incredible incentive to figure out ways to game the system. And that is I why I speak of “Web Apartheid”. This “solution” is at best a temporary fix that will satisfy too few people and will eventually lead to its own demise. Let’s hope Google has a more fair solution in mind and that the “Gone Supplemental” signs won’t be hanging on office doors for more than a few months.

Comment

Log in or Register to post a comment.

More

Read more posts by admin

5 ways to launch a new multiproduct eCommerce site - badly Google Broken: Supplemental Pages Not Being Parsed And Indexed