Volume SEO for large content Web sites

Posted by Michael Martinez on July 25, 2007 in SEO Theory

Most people look at search engine optimization as a means of improving rankings for a single keyword and maybe a few related expressions. That is the first mistake that beginners usually trip over.

Search engine optimization is not about keywords.

Search engine spam is about keywords. Search engine optimization is about Web content. Remember the definition for search engine optimization: “The art of designing or modifying Web pages to rank well in search engines.”

That is a very broad definition, not a very narrow one. You want your Web pages to peform optimally in search engines. You want them to rank well not for one, not for two, not for three keyword expressions but for as many keyword expressions as each page can possibly rank well for. There is no reason for why your Web content cannot rank well for up to 100 keyword expressions per page.

I’ve discussed how to do long tail SEO but I haven’t really addressed the limitations you run into with the long tail. Let me share some insights from my personal experience with Xenite.Org.

I founded Xenite in 1997 by consolidating four Web sites I had created on various commercial domains. For the first 2 years of my Web promotion experience I relied upon links from other Web sites and announcements on news groups and mailing lists. I did okay. By the middle of 1998 Xenite was being visited by several thousand people per week. By that time I had organized or joined over a dozen Webrings, joined (and launched my own) a couple of banner exchange networks, and had even taken out some magazine advertisements.

Everything I did improved Xenite’s visibility and traffic by a little but not by much. When I saw other fan sites spring up and almost overnight draw a lot of traffic, I realized there had to be more to Web site promotion than what I was doing. So I found Virtual Promote’s newsletter and began learning about how to take on the world of search.

By the end of 1998 Xenite.Org already had over 20,000 pages of content. I had resources I never even dreamed of. Problem was, the search engines (Altavista, Northern Light, Infoseek, and Inktomi to name a few) sucked at crawling and indexing the Web. My content was largely invisible to them chiefly because I didn’t understand how to organize my Web sites. My internal linkage was weak.

It took me about a year to change everything, maybe a little longer. Xenite didn’t stop building content in that time but the content became more optimized each month. My search referral statistics began to reflect significant increases in traffic. I learned about optimizing for long tail expressions from other people who had already been doing it for several years.

So optimizing for the long tail of search goes all the way back to the early days of search engines and Web server logs. As soon as people were able to see referral data in their server logs, they began optimizing links and content. My early Web sites didn’t give me access to server logs. I had no idea what they were when I got my first virtual hosting account. Fortunately for me, server log analysis was a popular topic 9 years ago. Hardly anyone discusses it today but in 1998 people explained the basic concepts to me. I’ve never stopped looking at server logs since.

As many of you know, I don’t much like third-party analytics software. Google Analytics, for example, is about the worst-written analytics software I’ve ever used. It has a relatively low capture rate (probably no better than 70-75%), it misses huge chunks of traffic for high volume pages, and its interface is about as awful as an analytic interface can be. There are no efficiencies in their reporting.

Still, because everyone was talking about Google Analytics by the end of 2005 I decided I’d better start working with it so I could at least stay involved in a discussion with clients and prospects. I’ll never install Google Analytics on all of Xenite’s content, and certainly not across the entire Xenite network. So Google Analytics can only capture a small (if significant) fraction of our visitor data.

Since the beginning of 2006 I have gradually added Google Analytics to more and more pages at Xenite. I know that at least several hundred pages, maybe 1,000 or more, now have the code embedded in them. So even with Google Analytics I now have a lot of data to look at.

The report tells me that, since January 1, 2006 there have been 321,687 visits to Xenite.Org. Since I have been adding the code gradually to those pages, it is impossible to know how many visitors Google never had an opportunity to record. In a typical month Xenite sees over 100,000 unique visits. The network sees over 200,000 visits per month.

Overall, Google drives about 20-24% of Xenite’s traffic every month. Most of our traffic comes from other sites or from direct clicks. Many people come back to Xenite to look at our news pages, our forum discussions, and whatever new feature articles we publish. Our poster store pages are also fairly popular (they get the vast majority of their traffic from internal links). Xenite is, in fact, its own largest referer.

Nonetheless, trying to analyze long tail patterns from traditional server log analysis is tedious, time-consuming hard work. I don’t do it as much as I used to because there are just too many expressions to evaluate. And according to Google Analytics, which has captured only a small fraction of our visitor data, search sent over 200,000 visitors to Xenite through more than 56,000 keywords.

Now here is where the first problem with long tail analysis arises: how do you look at 56,000 keywords? You can’t. It’s just not humanly possible. But Google Analytics throws its own monkey wrench in to the works.

You see, Google Analytics doesn’t normalize keywords. So “keyword” is reported as distinct from “keyword ” (there is a space at the end of the second example). Expressions such as “keyword+keyword” are treated as distinct from “keyword keyword”, “keyword-keyword”, and “+keyword+keyword”. Since there is no spacing between the keywords, all of these queries are virtually identical.

And remember those huge chunks of visitors that Google Analytics simply ignores? That missing data skews my visitor trends (not that Google Analytics could report a keyword trend if it had to). In any event, you cannot export the detail data from Google Analytics so all you can do is look at the pretty pictures on the page and wistfully ask how you’re supposed to analyze 5600 pages of data.

If you’re producing content that gets indexed by the search engines then every page can potentially rank for hundreds, maybe thousands of random expressions. You have no way of knowing what unanticipated keyword groups will become useful queries tomorrow. The search engines report, after all, that 20-25% of each month’s queries have never been used before.

You’ll never be able to chase more than a microscopic fraction of the long tail through optimization. The more content you produce, the more variations on your best keywords you’ll rank for, the more obscure expressions you’ll rank for, the longer your referral string reports become each month. Choosing good topics for that next feature article, that next blog post, that next reference table becomes increasingly tedious and uninspiring.

You would almost be better off to just cut and paste your keyword referral report text into a new HTML document that links to your HTML sitemap. Of course, the search engines might not appreciate that.

In June 2007, Xenite.Org received search traffic from more than 8,000 keyword expressions. The most active expressions generated more than 1,000 visits each. Nearly 8,000 expressions generated fewer than 14 visits each. I have no idea of why someone clicked through to Xenite for “zombie films essays”.

The next lesson in long tail optimization is that you never have more than a few money keywords. That is, no matter how well you optimize for “TV SHOW NAME episode summary” there just may not be much interest in that show any more. Today’s 1,000+ referral generating words may, a year from now, produce no more than a dozen referrals in a month.

If you’re working with a retail inventory you’ll find much the same ebb and flow in your search referrals. A lot of items are seasonal. Many are also event-driven sales items. A few years ago AllPosters.com couldn’t keep Orlando Bloom posters in stock. The more I asked them to stock, the more we sold. But his day as a hot poster boy is done.

So the next lesson in long tail keyword analysis is that you have to compile a lot of data, at least a year’s worth. You need to see what the historical trends for keywords are. You have to see where your peaks and valleys are. You have to understand your audience and what their interests are. You need to be able to place your content on the Web (and in the search engines) about 4-6 months in advance of when it will really be needed.

Which leads me to my last point: your long tail research will never end. If you want to build traffic through the long tail, you have to stay on top of it. It’s easy, when you’re producing a lot of content every week, to just keep rolling out new keyword expressions but you have to look at your search referrals to see which expressions are driving new traffic and representing new interest. You can’t just walk away and say, “Well, I’m done.”

Done never happens when you keep adding content.

1 Comment on Volume SEO for large content Web sites

By Gids on July 25, 2007 at 8:24 am

Michael - thank you for another great post.

Comment

Log in or Register to post a comment.

More

Read more posts by Michael Martinez

About the Author

Michael Martinez is the Director of Search Strategies for 1st Query, an Internet Marketing firm offering organic SEO and PPC services.

Trusted Site SEO: Build Trust Authority SEO SEM Theory: Shaping Search-related Theory