When site search comes up short
Posted by Michael Martinez on October 23, 2007 in Search Engine Optimization
An increasing number of voices in the SEO community are advising people to use “rel=’nofollow’” on their ancillary pages (examples include “About Us”, “Contact Us”, “Locations”, “Terms of Service”, etc.). This misguided advice doesn’t benefit anyone and on most sites won’t have much impact but where it will hurt you is if you’re relying on site search (or considering site search) to enhance your users’ navigation experience.
In April 2007 Jakob Nielson pointed out that “site visitors mainly use the primary menus and the search box” to move around a Web site. Nofollowing your internal links is not just stupid — it makes it harder for people to find content they may in fact be looking for. If you’re concerned about how your internal PageRank flows through your site (and that is only an issue that affects Google), then all you need to do is place links on your ancillary pages to point to your most important pages (something you should be doing anyway).
If you implement site search you want people to be able to find every page on your site. You should not have content on your site that you don’t want people to find (not if they can get to it through on-site navigation). Isolating pages from an active site is one thing. Pretending that you can hide pages from search engines but make them available to users without degrading the user experience is quite another thing.
Site search is not the most reliable resource you can provide your visitors, but if you offer it they will use it. It behooves you to provide a site search that is robust and digs deep into your content. That means you have to select a tool that follows every link and indexes every page it finds.
Now, there are certainly resources that can manage their own navigation. Calendar pages are one example. If you get users to the calendar that should, hopefully, be good enough. But keep in mind that forcing users to manually search through hundreds or thousands of calendar entries to find something they have seen before is not going to give them warm and fuzzy feelings. If you have a full calendar database, you need to make sure people can search it one way or another.
Forums also come with built-in search tools, but active forums produce a lot of new content and their search functions should be bundled with primary Web content search functions. Besides which, I’ve never yet used a built-in forum search tool that actually helped me find what I was looking for. I don’t know what those tools index but they don’t seem to index much.
So you can’t afford to ignore your forum posts in a comprehensive site search tool.
Nor, for that matter, can you afford to ignore your blog posts. Wordpress may be a nice blogging tool, but if I search for 100 seo questions I want to find an article with answers to nearly 100 SEO questions, not a mish-mash of unrelated posts. Blog search tools generally suck as badly as forum search tools.
Which leaves us with the search engines: Ask, Google, Live, and Yahoo!. Each has its own strengths and weaknesses, but they all have one weakness in common: they don’t index pages they haven’t crawled.
Uncrawled content is most often the very newest content your site offers. You add a page, a post, a new section — no search engine knows about it. That site search tool comes up dry, empty, looking foolish. It’s just not a universal solution to user navigation needs.
Now, if you use your own custom site search tool where you control the indexing, can you afford to rebuild the index every time you upload new content or is the process so massive and time-consuming you can only afford to do it once a week or once a month? That’s a real issue because on a very active site it’s easy for content to outstrip the custom site search solution. A dynamic search index where you can add pages on the fly is ideal, but how many sites can afford to build one with unlimited storage capacity?
We don’t often stop to consider that it takes more disk space to index a Web page than it does to just to hold the page. If your site covers about 1 gigabyte of disk space from end to end, indexing all that content requires many more gigabytes of disk space. The day is not far off when many large content sites will have to have their own dedicated site search servers — and large eCommerce companies like Amazon and Hewlett-Packard may already be co-locating servers in multiple data centers just to handle the tens of millions of queries they process every month (you can read an interview with HP search manager Laura Dansbury to get some insight into how they look at site search).
So when you’re looking at site search, you have to look at how to manage its limitations as well as to leverage its advantages. You need to take new content into consideration. How will you tell your visitors about new content until it appears in your site search? How will you tell the search indexes about your new pages? How will you use site search to stimulate interest in new content?
I make sure I cross-promote heavily on Xenite.Org’s network precisely because I don’t know which pages will turn up in a site search for any given keyword. While the cross-promotional links make their pages more relevant (through anchor text) and thus cloud the searches a little bit, over time the search engines figure out where the most important content is — although they may not always honor my link data and show the most relevant pages first.
I also use my HTML sitemap to help people find stuff. Now, Xenite’s sitemap page is a mess and it badly needs to be cleaned up. I’m planning that, but in the meantime it helps people find new content.
Any site that has used XML sitemaps (particularly with the auto-discovery option in robots.txt) needs to ensure that the XML sitemaps are updated as new content is added to the site. On-site RSS feeds should be updated automagically, too (and blog and forum software takes care of this, but there are other types of RSS feeds).
Finally, you can look into the practice keyword clustering. You create a keyword hub page where you link out to the most relevant pages from the cluster/hub. The page acts like a miniature directory for a growing volume of similar content. Once the keyword cluster/hub is indexed in your site search it should stand a pretty good chance of showing up in site search results. Use this resource to help people navigate toward valuable relevant content that they cannot find through normal, on-site navigation and site search.
Site search has been stuck in its infancy for years. Some good advances have been made in basic technologies but the fact remains that effective, efficient site search is beyond most Webmasters’ abilities to implement. You have to offer it for a lagre content site but you have to understand that it doesn’t replace on-site navigation, it doesn’t replace HTML sitemaps, and it probably won’t ever be perfect.
But most importantly of all you have to remember that people are searching your site whether you make it easy for them to do so or not: site search has been important to users for years. Now, with people in the SEO field giving out bad advice about using “rel=’nofollow’” on internal links, it’s more important than ever to maintain your competitive advantage by optimizing for site search.
A LOT of people are going to go down the garden path to nofollow disaster. Now is the time to build your Web site’s value by making as much content available to your visitors as possible. Choose your site search resources carefully and nurture them. Show people you want to help them find the best, most relevant content you have to offer.
7 Comments on When site search comes up short
By deInternetMarketeer on October 23, 2007 at 10:38 am
“Now, if you use your own custom site search tool where you control the indexing, can you afford to rebuild the index every time you upload new content or is the process so massive and time-consuming you can only afford to do it once a week or once a month?”
Most of those search engines can be used with a crontab i think.
Then you can set how long it may take before pages have to be respidered, how frequently there has to be looked if their are pages that need respidering etc
If you spider the index or the sitemap your job is done.
“That’s a real issue because on a very active site it’s easy for content to outstrip the custom site search solution. A dynamic search index where you can add pages on the fly is ideal, but how many sites can afford to build one with unlimited storage capacity?”
Storage capacity is indeed an issue that has to be considered.
Some custom site search tools :
http://www.xav.com/scripts/search/
http://lucene.apache.org/java/docs/
http://www.htdig.org/
By Michael Martinez on October 23, 2007 at 11:44 am
Thanks for the suggestions!
By Mark on October 23, 2007 at 8:28 pm
Not sure, but I believe HP still uses the Ultraseek search engine (now) owned by Autonomy.
By Mark on October 23, 2007 at 8:41 pm
The products kindly identified by 1deIM offer incremental indexing, so the index is not actually rebuilt each time. The spider figures out a schedule for itself, over time, so that directories with rapidly changing content get spidered more often that those that stay fairly static.
For index size - we generally allow 40KB per document, or 40MB of disk per million pages. Realistically HTML-only sites are way smaller and sites with .doc or.pdf pages are way larger. Either way, disk size these days is pretty much a non-issue for site search.
In Laura’s interview she identified ‘quick links’ as a major bonus for site search. Can’t stress this enough. Our clients use this capability to return defined pages against defined search queries, bypassing the relevancy algo altogether for these high-importance pages. Very effective in presenting the desired sales message. Typically only done on a handful of pages, leaving the relevancy algo alone for people who actually know what they are looking for.
The defined search queries, of course, come from the search logs.
By Michael Martinez on October 23, 2007 at 9:09 pm
Great follow up, Mark! Thanks.
Site search has been around for a long time and I’ve used many different options. I used to rely on WhatUSeek and had them rebuild my index once a week. For the time it was a pretty good tool but they folded and Mamma bought them or something and I had to move on.
I’ve always wanted something better than everything I’ve been able to live with.
By Mark on October 24, 2007 at 2:35 pm
No probs Michael.
if you’d like to experience site search that’s a little bit different from the web search derivatives, pay a visit to http://www.atomz.com (disclaimer: I used to represent these guys here in Australia before websidestory bought out them a few years ago. Now I have no relationship with them).
The free account gives you up to 750 pages for your index. It’s a hosted search, just like the others, dead easy to set up, and has most of the fruit. However they cover their costs by serving ads within the SERPs, so it’s not truly agnostic.
By Michael Martinez on October 24, 2007 at 6:19 pm
I’m afraid Atomz is a little too limited for my personal needs but I appreciate the suggestions.
For now, Microsoft’s Live is doing just fine.
Comment
Log in or Register to post a comment.