Link Analysis: Crunchy numbers and soft tallies

Posted by Michael Martinez on September 26, 2007 in Intermediate SEO, Link Theory


Some definitions for useful link analysis


It’s time to start looking at numbers, but first let’s define some concepts. If you’re serious about analyzing links then you have to be serious about analyzing content. That is, Web page structure is very important to your link analysis. We’re not looking for numbers of backlinks. We’re looking for placements of links and what their contexts tell us.

Page zones: You can use whatever zone definitions you want but you had better make sure they are relavent to page layouts and not simply to some zone-definition you picked up on the Web. There is nothing wrong with anyone’s particular zone definition but you need to use definitions in the proper context.

Let’s keep this as simple as possible and define eight (8) zones:
Anchor Margin Above The Fold - This is the upper left-hand margin of the page in languages that read left-to-right and the upper right-hand margin of the page in languages that read right-to-left.

The “fold”, as most if not all of you know, is that part of the page that is immediately visible in the browser window when the page loads for the visitor. There is no hard fold line, as browser window size, screen resolution, and user/page font sizes all affect how much of the page is above the fold.

Anchor Margin Below The Fold - Given the above definition, this one should be obvious.

Header Margin - This is the traditional “masthead” region at the top center of the page.

Leaf Margin Above The Fold - This is the opposite margin from the Anchor Margin Above The Fold. So in left-to-right languages the Leaf Margin Above The Fold is the upper right margin and in right-to-left languages the Leaf Margin Above The Fold is the upper left margin.

Leaf Margin Below The Fold - Given the above definition, this one should be obvious.

Footer Margin - This is the traditional “page footer” section at the bottom of the page.

Upper Body - This is the central “body” portion of the page above the fold.

Lower Body - This is the central “body” portion of the page below the fold.

Your margins from completely from top to bottom. That is, there are no “masthead anchor margin” and “masthead leaf margin” spaces and no “footer anchor margin” and “footer leaf margin” spaces. These definitions are purely arbitrary, purely my own, and if you use what follows with other definitions it may or may not make sense.

There are three types of links: Navigational Links, Body Links, and Promotional Links. These link definitions have nothing to do with the zone definitions. That is, placement of a link does not determine what type of link it is.

Navigational Links may be found in any zone but are most often placed in anchor margins and mastheads. Navigational links connect your pages to either their siblings or the most important pages on your site. In a large site, a navigational link typically points to the root URL, a directory/folder index page, an “incidental” page (like “About Us” or “Contact Us”), or a sibling page (a page located in the same directory or otherwise directly connected to the page through a chain of links).

Body Links may be placed in any zone but are most often found in the body zones or the non-navigational margin zones. A sidebar article on a page, usually located opposite the navigation links, may include a body link. A body link may point to content either elsewehere on the site or on another site.

An internal body link points to non-sibling, non-incidental, and non-index pages. In other words, an internal body link points to deep content on the same site located outside of the linking page’s directory. Clearly, not every Web site can include internal body links.

Promotional links may be placed in any zone and may point to content either on the site or off the site. Promotional links may point to the same content as navigational links or body links. A promotional link is distinguished from a body link by its context. Most promotional links are associated with graphical ads (banners), affiliate ads, etc. A paid link may or may not be a promotional link (that is, functionality determines whether the paid link is promotional, not the fact that it is paid for).

A promotional link may be freely given. A promotional link may be given in exchange for a reciprocal link. In my opinion, most spam links tend to be promotional links. For example, all links included in blog comments are probably promotional links, even though they may only point to an on-site profile page.


Link Formula 1: Ratio of Body To Non-Body Links


If you can tally up all the links placed on a Web site and sort them according to my proposed categories, you can see how “body-heavy” the linkage is. For this formula, count all the links in a breadcrumb trail as one link.

The higher the percentage of body links, the more likely the site is to be a linking resource (like a directory or a blog that points to other blogs, or an article that rounds up links to other content, a social media category page, etc.).

The higher the percentage of navigational links, the more likely the site is to be an informational resource (like a news site, a blog that doesn’t link out much, etc.).

The higher the percentage of promotional links, the more likely the site is to offer low-quality information. For example, blogs with over 100+ comments attached to each post (or forums with 100+ posts per thread) are not likely to present a coherent message. The original blog post or forum message might be useful or not. The reason the site is a low-quality resource, however, is simply the fact that it is not coherent.

That is, you usually have multiple viewpoints that contradict each other, give ground, take ground, and otherwise mix up facts and opinions without presenting a coherent message to the casual visitor.

This is a rule-of-thumb, not a statistical predictor of quality. Some people would argue, for instance, that Matt Cutts offers some pretty good information on his blog. I agree that he does, and often he shares some real diamond viewpoints in the comments when he answers questions. And while the questioners might be asking very good questions, you can pretty much trust that the majority of comments in a 100+ comment blog post on Matt’s blog are not really adding any trustworthy information with respect to the blog post topic.

Quality and quantity are not to be confused, but there are some situations where a large number of comments and posts might be useful. For example, if someone conducts an informal poll and invites comments, you may get a lot of comments.

But we’re analyzing links, not content. So just how useful are all the signature links in those blog comments and forum posts? They are promotional in nature and their mere presence in large numbers indicates that you’re getting a real piece of potluck content. So a site where the majority of links are promotional is not very likey at all to provide really good information.


Formula 2: Navigational Complexity


Navigational links may all be visible at one time or they may not. The more navigational links that you don’t see on the entry page of a Web site (not including HTML sitemaps and search results pages), the larger that site tends to be. A large content site needs to be well-organized, however, and if its navigational footprint is small on deep pages that’s a good sign that the site is probably not being well-crawled.

People will disagree for a variety of good reasons, but we’re doing quick analyses here. You should start with your own Web site. Pick one of your deep pages at random and count the number of navigational links you include on that page. How many other pages on your site can you reach within 2 clicks?

If the answer is less than 50% you’ve either done something wrong or else your site is absolutely massive. For large content sites the answer should be, “At least as many major sections as there are on the site PLUS the minimum number of links on any random HTML sitemap page”.

Every page on your site should be linking to at least one HTML sitemap page. The more HTML sitemap pages your deep content links to, the better.

A more challenging question to ask yourself is, “How many deep-content pages can I reach in 3 clicks without using the HTML sitemap?” On a relatively small site (say, fewer than 100 pages) your answer should be upwards of 80%. On a moderate-sized site (between 100 and 1000 pages) your answer should be upwards of 50%. On a massive site (more than 1000 pages) your answer should be, “All of them.”

Now, I don’t always make it easy to reach every page on my large sites but I am constantly experimenting with new tools and features. On Xenite, for example, many of my pages now link to our custom search engine for the Xenite network. The problem with the CSE, however, is that it doesn’t do a very good job for site search (as an aside, if you have a large content site, you need to buy a serious site indexing tool and not rely on the free tools provided by the search engines).

BTW — site search counts as a single navigational link. Yes, I could have told you that up front, but if you were not caught by surprise by my last test answer, then you understand the value of site search.


Formula 3: Body Complexity


Assuming you can count the number of internal versus external body links on your site, the ratio will tell you something about the quality of your site’s content. That doesn’t mean having either a high or a low count of internal body links indicates you have poor quality content. It just indicates that your content may or may not be of certain types.

For example, directory sites tend to have relatively few internal body links compared to their external body links. Some news sites, on the other hand, tend to have to many internal body links when they cross-link to related stories. Some blogs are also very good about cross-linking to related posts.

What if a site has a low promotional link count and about equal body and navigational link counts? What would that tell you about the types of sites that fit that profile?

What if a site has a low promotional link count, a low navigational link count, and a high body link count? What can you guess about the type of content that site would have?

You can also look at body links to paragraphs. A paragraph is any block of text regardless of whether the text is used as anchors. That is, if you have a bullet list of links, that counts as one paragraph even though there is no unanchored text (text outside of links). If you have a bullet list of text that contains no links then you still have one paragraph of text.

Body content can be found in any zone. Body content is not used for navigation or disclaimers or to frame promotional links. If it seems easier for you to think in terms of “body text blocks”, then that is okay.

What does the ratio of body text blocks to body links say about a site’s function and purpose?

What does the ratio of body text blocks to promotional links say about the site’s function and purpose? You might be surprised to learn (or not) that some news sites have higher body text block-to-promotional link ratios than many spam blogs. It’s something to think about, as spam blogs may actualy be easier to get links from (usually you don’t have to do anything but let your content be scraped).


Counting links and body content


Now, before you fire up your link scraping software and hit all your competitors and enemies’ Web sites with potentially server-crashing loads of page requests, stop and think about how you would want someone to analyze your own site. Sometimes random sampling will tell you a great deal more about a Web site than grabbing hundreds or thousands of pages.

In fact, you should randomly sample your own site’s pages for a while and practice your link analysis on your own work. You need to know how crawlable your site is. You need to know how much of a linking resource your site is. You need to know how much promotional content your site has.

Some sites will naturally attract a great deal of promotional content because of their own great content. Matt Cutts’ blog is a perfect example. The Cuttletts are in there promoting themselves all the time. So the fact that he has a lot of promotional content and links in his comments doesn’t make his site a bad resource. I said it’s low-quality content but in reality most of us pay more attention to what the blogger says on his blog than we pay to the majority of the comments.

Which is not meant to take anything away from people who comment in blogs. As many of you know, I occasionally post comments on other blogs, sometimes offering contrary opinions, sometimes offering agreement, sometimes offering clarification or additional information. A heavily commented blog post can be a gold-mine of useful information.

But when you have practiced analyzing your own linking structures for a while, you’ll begin to see patterns that may help to explain some of the nagging questions you have struggled with. Why do some pages get more traffic than others, even when they are deep pages? Why do some pages get more links than others? Why do some pages seem to be buried in search results? Why do some pages seem to show up in queries all over the place?

Your content helps to answer some of those questions, but because search engine indexing is heavily (though not totally) dependent upon your linking structure you really need to understand just what your linking structure is doing. I’ve had to tell more than one client or friend that their Web site linking structure is not as strong as it could be.

Read other articles in this series:

  1. Fundamental Principles for Link Analysis
  2. What every good SEO should know about link analysis
  3. Linking analysis: Who is linking to you and why
  4. Link Analysis: Crunchy numbers and soft tallies

8 Comments on Link Analysis: Crunchy numbers and soft tallies

By dodito on September 27, 2007 at 1:39 am

Michael,

very nice article, and it is nice to somehow see “things fall into place”. I can definitely apply your structure to our pages and understand their role. What bothers me a bit is the “promotional links”. I understand their role in comments etc., also banners/icons etc.. ok clear.

But take nyt.com and focus on the body (so ignore the navigational panels for a moment). Lots of links, very little text, lots of small paragraphs. Would you consider these type of links “body links” or “promotional links”.. personally I would say promotional.. unless you’d create more of a paragraph with unlinked text around it.. (in other words increase the test/link ratio). But I find the “promotional link” part a bit confusing.

And so is this good or bad ? nyt.com is a general news home page. What if you have a page dealing with only one topic but which has a similar structure…. such a page may be very focused and therefore useful for the visitor.. so does it really matter at some point what name you give to the links ? It’s clear it’s an “access point” to “more information”.. ?

By Michael Martinez on September 27, 2007 at 9:20 am

If you’re referring to the front page of the New York Times’ Web site, all the links within the body zones are navigational links. News Web sites are special cases that I did not include in my working definition. The front page links to deep content are navigational in purpose. Xenite.Org also places navigational links in the body zones of its front page.

I should have noted that type of exception.

By tinkerbellchime on September 27, 2007 at 7:40 pm

Michael — What about links that lead to pdf, powerpoint, and wmv files that are part of the same website? I’m assuming that they are navigational, but how are they treated by the search engines as regards SERPS? I know firsthand that ppt presentations are indexed by Google and so are pdf files, but would that same content count more if it were posted on a regular web page? Or are they probably given the same weight? Also, since a wmv (ex. MS PhotoStory presentation) file can’t be crawled by the search engines for text, I figure that the only way the search engines can determine what it’s about is by the file name and links pointing to it. Now, if text is written over the photos that make up the presentation, can the search engines read this text?

Thanks.

By Michael Martinez on September 28, 2007 at 12:00 am

What about links that lead to pdf, powerpoint, and wmv files that are part of the same website? I’m assuming that they are navigational, but how are they treated by the search engines as regards SERPS?

Links to media files ensure that they are found and indexed as much as the search engines can index media files. .PDF files, .DOC files, and .TXT files are almost treated the same as HTML content these days.

I feel strongly that other media content should be used to complement and enhance standard HTML pages rather than made into freestanding objects. A lot of Flash Web sites, for example, have trouble gaining traction in search results.

If text is embedded inside a media object (not simply rendered as part of the media) a search engine may attempt to extract that text but the technology is still not what we want it to be. I would still want to optimize a page where the media is embedded rather than try to optimize the media directly.

By tinkerbellchime on September 28, 2007 at 6:51 am

Thanks, Michael. This is great information for all of us. I think you are right. I’m wondering if adding text to the ‘notes’ section of a ppt presentation, rather than just the slides themselves, would help optimize the file even further because then the search engines would have even more text to read. Have you tried this? I mean adding ’speaker notes’ to a ppt presentation before putting it online. I never use the ’speaker’s notes’ part, so I don’t know if search engines read them.

By dodito on September 28, 2007 at 7:23 am

Michael,

News sites aside, and I see you also use these type of pages on subdomains as well. Still I am confused what a promotional link is, if it were a normal text link. You say it depends on the context, and I can imagine some.. say a links page on any ecommerce site with dozens of links.. versus a page with a lot of text and some links in there.. still I would appreciate what *exactly* constitutes a promotional link (in the sense that search engines would make that difference).

Thanks

By Michael Martinez on September 28, 2007 at 8:07 am

tinkerbellchime, I have not yet had a need to optimize PowerPoint files. I have optimized .PDF files though. Other people have shared some great tips on how to optimize .PDF files.

dodito, if a link’s purpose is to help promote another Web site then it’s a promotional link. I embed promotional links in copy all the time. They are not paid links, they are not advertisements, they are just “promotional” in nature.

The link I embedded in the first paragraph of this comment is NOT promotional because it is informative. Every link in my blogroll is a promotional link. If I were to write a blog article titled “10 SEO Web sites I visit” and included links to the 10 sites, those links would be both promotional and informative.

A good rule of thumb might be, “does the copy make sense without the link?” If you can honestly say “yes”, then the link is probably promotional. I could write an article about 10 SEO sites without having to link to them.

By dodito on September 30, 2007 at 5:46 am

Got it. The same criteria can be applied to my internal links. Thanks very much for clarifying this point.

Comment

Log in or Register to post a comment.

More

Read more posts by Michael Martinez

About the Author

Michael Martinez is the Director of Search Strategies for Visible Technologies, Inc. A former moderator at SEO forums such as JimWorld an Spider-food, Michael has been active in search engine optimization since 1998 and Web site design and promotion since 1996. Michael was a regular contributor to Suite101 (1998-2003) and SEOmoz (2006).

Linking analysis: Who is linking to you and why Scooping up visibility on the Web