Instrasite searching with Google

February 08, 2005

I doubt many of you have noticed (especially the regular readers of this site), but I reworked the search section about a month ago so that it now uses Google for intrasite searches. I never mentioned it here because I didn't really see the point, but I've had a couple people ask me how to do it and so I thought I'd explain it a bit here.

The reason I handed over searching to Google in the first place was because of the way Movable Type returns search results; not only does it refuse to remove the trailing filename but it also likes to place anchor tags at the end of it. Technically, there is nothing wrong with this as it still gets people to the page they were looking for, but it just "looks" ugly and people end up linking to me with malformed URIs. This never really bothered me until I implemented future-proofed URIs here (and wrote up a HOW-TO on how to set this up in Movable Type). My URIs are "clean" and I wanted my search results to follow suit. Yes, I could have just hacked up the MT code but this way was much quicker and also allows us to search using a syntax we're quite familiar with.

Setting Up Your Site

This certainly isn't rocket science and anyone with a rudimentary understanding of HTML forms and the Google syntax could figure it out, but for those among us who are a little less motivated, I've put my form below.

<form method="get" action="">
<input type="text" name="q" maxlength="255" class="search"name="search">
<input type="hidden" name="sitesearch" value="">
<input type="hidden" name="q" value="-intitle:archives">
<input type="hidden" name="q" value="-intitle:referrers">
<input type="hidden" name="q" value="-intitle:links">
<input type="submit" name="sa" VALUE="Search">

I'm going to assume that most of this is pretty self-explanatory, but for the sake of completeness, I'll elaborate some. The "sitesearch" input element tells Google to search only within the given site. You'll notice that I go a bit further and include a few "intitle" restrictions; if I didn't do this, the user would get the same document multiple times (because of the index page and monthly/individual archives). I recommend playing around with the search on your site to decide what input tags you will or won't need — this will obviously vary greatly from site to site and will be based in large part on your site's structure.

This method isn't exactly perfect and there will be certain circumstances where the results, without further tweaking, are completely useless. For example, if you search for "gadgets" on this site, you will see that Google returns nearly every page under my domain. This is because I have the word "gadgets" in the menu on the right. There isn't much that can be done about this except maybe narrowing your search. Perhaps in the future Google will allow you to restrict searches to divs within a document's markup.   *shrug*

When time permits, I think I'm going to use the Google API to do intrasite searches so that I can style the results to look like the rest of my site (i.e., the user won't realize that the search was done off-site).

You should follow me on Twitter here