Group Connectivity Theory for SEO

Posted by Michael Martinez on August 21, 2007 in SEO Theory

There are several ways to get a new Web site crawled and indexed. While I and most other people in the industry no longer believe in direct submission to search engines (I occasionally test it with limited success), we have a growing list of alternative methods including: submitting XML sitemaps, pinging blog distribution services with RSS files, accumulating links from other sites, and doing nothing.

Doing nothing is the easiest method but it has a lot of success. Search engines find new domains quickly. Ask, Live, Google, and Yahoo! have all hit new domains before I’ve had time to put content on them. I usually block the crawlers with a robots.txt file until I have content in place. Some people claim they have problems getting crawled and indexed if they do nothing.

Getting other sites to link to a new domain is the most difficult and time-consuming process, unless you control a few domains that are already indexed and crawled. Then you can load them up with links to the new domain and you’re set to go. Some people are afraid they’ll be penalized for linking to “irrelevant content”, however.

You don’t actually have to have a blog to ping blog distribution services. If they pick up whatever RSS file you ping with, they’ll pass on the data to search engines and you get into the blog search indexes. From there it doesn’t take much effort to get pages into the main Web indexes. Some people claim this doesn’t happen for them.

You can also submit an XML sitemap or just point to it through your robots.txt file. There are plenty of XML sitemap generator tools out there. I’ve tried several and they all get the job done. Of course, some people claim that they have no success with XML sitemaps — or else their sites are only partially crawled, or they go Supplemental.

It’s impossible to explain away all the failures and partial successes. No one method works for everyone, but I do try to cover my bases when I launch a new site. But I don’t rely upon direct submission to search engines and I don’t wait for other sites to link to my new content. Now, it’s true that I do have a personal network of Web sites, so I’m able to launch new sites with links from trusted domains.

I’m not concerned about what the search engines think. After all, trust begins at home and if you don’t trust your own content enough to link to it then you have no business expecting anyone else to trust it, either. I strive to provide good content. I also strive to build good site structures that are crawlable and which use their links to emphasize the most important pages.

Each search engine has to decide for itself whether to index a site, how much of the site to index, and what to do with the site. Ask is the pickiest search engine. Google is the easiest to get into (for most verticals). Since I don’t rely on just one search engine to drive traffic to my sites, I make sure I employ the simplest possible site structures.

That said, I like to build complex sites with lots of depth. That is, I’m not afraid of building deep content. I’ll go as many directories deep as I can because that helps me to see in my mind which pages are the most important pages on the site. Deep content is no more of a challenge for search indexing and ranking than root URLs. It really depends on what you do with your content, and how you look at your site structure greatly influences your success.

When it came time for me to get a driver’s license, my family enrolled me in a driving school. My instructor was a retired state patrol officer who taught his students certain techniques to ensure that our vehicles were always positioned carefully. For basic driving, he told us to position our hands on the steering wheel at the 10 O’clock and 2 O’clock positions, to sit squarely behind the steering wheel, and to picture ourselves sitting in the road directly in front of us.

When you become accustomed to imagining yourself moving in front of the vehicle, the car goes wherever you want it to. The same principle works for Web site indexing.

If you picture the site as a set of rocks in a pond, think of how you would step across those rocks in order to cross the pond. Imagine that each rock is a section of your site, not just a single page. With this kind of analogy, most people would conclude that — depending on the positions of the rocks relative to each other — there is probably a shortest, safest path across the pond.

That is the wrong approach. Instead, you should imagine a network of pathways connecting all the rocks in as many ways as possible. The only restriction on your imagination is that no pathway can cross another pathway. You need as much connectivity between the rocks (sections of your site) as you can obtain without crossing lines.

A short, safe path across the pond represents the least amount of connectivity possible between your rocks (site sections). A network of pathways connecting all the rocks means you can cross the pond in a multitude of ways. In fact, every rock should be treated as an entry point to the network, and as an exit point.

Does that sound familiar? Of course it may sound very much like the “long tail of search” lecture, in which the point is made that every page on a site may act as an entry point to the Web site. But through factors that you control, you can strongly influence which pages are most likely to be the entry points for groups of queries.

The same structure will help you with your site crawling and indexing. High interconnectivity between your sections helps search engines find your content quickly and determine with great accuracy which pages are the most important. You can look at page interconnectivity in several ways.

First, there is IntraGroup Connectivity. Let’s say you position each section of your Web site in its own sub-directory. Every page in that sub-directory can link to every other page in that sub-directory. Some people would stop there, allowing only the directory index page to link out and pointing all links from other sections to that index page.

But that doesn’t make much sense. If you have 10 pages in a sub-directory, odds are pretty good that you’ll refer to their individual content elsewhere on your site. You should not hesitate to deep-link to any of those 10 pages.

You can also position sub-directories within sub-directories. Many large content sites do just that. Some sites use nested sub-directories to embed keywords in page URLs. Some sites also use sub-directories to “silo” content — creating miniature verticals that are isolated. Siloing is not a good idea. It destroys the interconnectivity of your Web site.

IntraGroup Connectivity can be paired with sibling group connectivity. Think of two groups that are positioned closely to each other, like two sub-directories located in a shared parent directory. These sibling groups form a Supergroup, and therefore you can have more than one Supergroup. Hence, you can also have IntraSupergroup Connectivity.

Supergroups don’t have to be relevant to each other. They can be related in non-topical ways. They are certainy related through their URLs if nothing else. Supergroups may be positioned on sub-domains or within directories. The attributes of a Supergroup are: two or more directories located in a shared parent directory, at least two of the sub-directories have multiple pages; there is at least one link from each group pointing to the other group.

You can extend the Group-Supergroup relationship to degrees of absurdity but in a logical format you go from IntraGroup Connectivity to IntraSupergroup Connectivity to IntraSite Connectivity. IntraSite Connectivity differs from IntraSupergroup connectivity in that anything can link to anything without regard for placement or relevance. Hence, IntraSite Connectivity is very fluid and undisciplined.

We can say that Intragroup Connectivity is very strong, that IntraSupergroup Connectivity is moderately strong, and that IntraSite Connectivity is moderately weak. But it doesn’t stop there. Because primary domains and sub-domains are usually treated as separate “hosts” by the search engines, you can have Sitegroups on a domain comprised of multiple sub-domains (and/or the primary domain).

Hence, you can have IntraSitegroup Connectivity, which like IntraSite Connectivity has the fewest requirements: any page in one site can link to any page in any other site, but both sites have to be part of the same domain.

IntraSitegroup Connectivity is both weak and strong. It is weak because the pages of one site are not likely to be well-linked among the pages of another site (but you should have control over this on your own domain). It is strong because you do have control over this on your own domain, and because the intersite links may work like external links.

Groups, Supergroups, Sites, and Sitegroups are the components of site structure and you want to build the strongest connectivity between these components to improve your site’s chances of being crawled and indexed, as well as to influence which pages are selected as the dominant pages in each group. Your own internal linkage should be marking the pathways that search engines and visitors need to follow from point to point in their journeys across your site.

The tools you use for your connectivity include your HTML sitemaps, your robots.txt, your XML sitemaps, and your internal navigation. But there are other tools you can use, if you think about ways to associate distinct parts of your content with each other. Create “glue” pages that bring siblings, cousins, and strangers together among the community of pages that comprise your site.

I’ll come back to connectivity later.

3 Comments on Group Connectivity Theory for SEO

By dodito on September 23, 2007 at 2:21 pm

“The only restriction on your imagination is that no pathway can cross another pathway. You need as much connectivity between the rocks (sections of your site) as you can obtain without crossing lines.”

What do you mean with this exactly ?

By Michael Martinez on September 23, 2007 at 10:06 pm

I mean you need to be efficient in your linking, not wasteful. Getting people from point A to point B is only one aspect of good link management. You also need to emphasize which pages are more important than others.

By dodito on September 25, 2007 at 11:33 pm

Michael,

OK.. I thought you meant literally pathways i.e. if path 1 goes from A to B and cross C then another path 2 from A to B should not make use of C, since it’d be crossing path 1 from A to B. I was a bit puzzled by that.

Thanks

Comment

Log in or Register to post a comment.

More

Read more posts by Michael Martinez

About the Author

Michael Martinez is the Director of Search Strategies for Visible Technologies, Inc. A former moderator at SEO forums such as JimWorld an Spider-food, Michael has been active in search engine optimization since 1998 and Web site design and promotion since 1996. Michael was a regular contributor to Suite101 (1998-2003) and SEOmoz (2006).

Crawling Intent, Hidden Diagram How To Write An SEO Technician Resume