del.icio.us driven Google custom search

This is an account of how and why I wrote a Google custom search engine to search sites that I had bookmarked on del.icio.us.

I’ve liked the Google custom search engine since I first played with it shortly after it came out. If you don’t know about Google CSE, it allows an individual or group to create a search form that will perform a full text search using the Google search engine but limited to sites which they choose. This search form, and the results page, can be embedded in any website. I think it is the obvious way to build a cross search across all the centres in an organization like the HE Academy (this was one of my first custom search engines). Better, for teaching and learning you can set up a reading list of recommended sites for a course and let students do a full google search that prioritizes those sites (for a sort of generic variation on this see Tony Hirst’s Open Educational Resources search). Better still, let the students as a group decide which sites they want on their course reading list.

Building and editing a Google custom search using the interface on the Google site is by no means difficult, but over the past year or so some tools to make it even easier have come out. Read the rest of this entry »

Setting Canonical Domain with Apache

An experiment in search engine optimization:
My work site, www.economicsnetwork.ac.uk (or economicsnetwork.ac.uk if you’re intimate) is also known by three other domain names, because of past re-branding. My problem? How to tell search engines that these are the exact same site, so they know that an external link to, say, www.economics.ltsn.ac.uk is to count as a link to www.economicsnetwork.ac.uk (and boost my site’s Google ranking, goddamit!). Establishing a canonical domain name like this should also help consistency of brand (i.e. helping the user know what site they are on and what to call it).

For a long time I had a <base href=”…”> tag in my home page to set the canonical domain. This is dumb. It only ensures that a user sees the domain once they’ve come to the home page and then clicked a link. A check shows that www.economics.ltsn.ac.uk, www.economics.heacademy.ac.uk and econltsn.ilrt.bris.ac.uk still exist in the Google index as separate sites. A serious fix requires a few lines of Apache config:

RewriteEngine on
RewriteCond %{HTTP_HOST} (ltsn|heacademy)
RewriteRule (.*) http://www.economicsnetwork.ac.uk$1 [R=301,L]

Read the rest of this entry »