Embedding repository searches

For a long time now we have wanted to embed searches of other people’s collections of stuff into the Engineering Subject Centre‘s website. The general idea is to let people look at those collections without leaving our site, which may or may not be sensible. In one case doing so would allow us to use a remote repository service (the Jorum) to host resources that we want to help make available, saving us the cost and risk of managing the repository while showing the close relationship that we have with this material. With the help of our friends at the Jorum we have been experimenting with a light-weight approach to using the SRU (search and retrieve by URL) standard to achieve this.

SRU is a bona fide library standard for remote query, drawing on the venerable Z39.50 but updated for the modern world of the web and XML. Perhaps surprisingly given its pedigree, SRU is pretty straightforward. You encode your query, stick it at the end of a URL and GET an XML encoded result set (or error message). A typical URL will look something like

http://z3950.loc.gov:7090/voyager
?operation=searchRetrieve&version=1.1
&query=dinosaur&maximumRecords=1&recordSchema=dc

You can cut-n-paste this into you browser URL bar (without line breaks or spaces!) hit return and see what some typical results look like, in this case from the Library of Congress voyager service.

A while back I saw a presentation about an open source light-weight SRU client that had been developed by Intrallect Ltd. for the JISC-funded CDLOR project. The approach taken, briefly, was to use a form on a webpage to allow the user to specify their search terms and the repository they wanted to search. On pressing the submit button this, with help from a little JavaScript to make sure the query syntax was right, would send a GET request to the repository’s SRU interface. What is returned to browser is raw XML, but a feature of SRU allows you to specify the URL of a XSL Transformation that gets put at the head of the XML file so that it is applied by the browser to transform the XML into XHTML.

You can see my effort at adapting this for something like the task we want to do at my SRU Test Search page. Do take note that it is pointing to a Jorum server that is used for testing and development, not the production server, so sometimes it is deliberately broken. If you’re interested my source code is all collected in one zip file. You’ll soon see why I don’t earn a living by programming.

Some reflections on doing this:–

  1. SRU is as simple to implement as you could hope.
  2. Since we only do simple searches of a single repository, the work the JavaScript has to do is minimal (all it does it put quotes around the search terms when there is more than one).
  3. Writing an XSLT to get the returned list of LOM records into XHTML wasn’t so simple. Actually, it’s not the LOM stuff that’s difficult or the SRU response XML (I don’t think RSS-formatted responses would help), it’s doing stuff like pagination, where getting the XSLT to test whether there is a next page to display is mind bending. Perhaps that’s my inexperience with XSLT showing.
  4. There is something of a gotcha in the Intrallect approach: the XSLT file has to be served from the same domain as the XML file it acts on. This is a security measure implemented by browsers because of phishing scams and the like. Clearly, you’re not always going to be in the position where the repository manager will play host for you in this way. The way around this was to write a simple Perl CGI script, that takes the query from the web form and sends it on to the SRU service, when it gets the reply echos it to the browser. That way both XML and XSLT files come from our server. Fortunately the Perl cgi-lib and LWP libraries make this pretty simple.
  5. One gotcha left in my XSLT is that I assume that the repository echos the query you sent back in the XML response. This is optional in SRU, and at least one other repository that we would like to search doesn’t implement <echoedSearchRetrieveRequest> feature.
  6. Once you’re using a CGI script (or similar) there’s the whole question of how you factor the work between the JavaScript, CGI, and XSLT. You could do it all in Perl (or PHP, or ASP) … but that’s another story.

In conclusion, SRU offers a standard, professional yet simple means of embedding query of a remote repository or catalogue into a website (e.g. a VLE), though you might find some of the result-handling needs thinking about.

Advertisements

One Response to “Embedding repository searches”

  1. Idea: extension to previous literature « Bayesian Feed Filter Says:

    […] I did a while back on transforming SRU responses to HTML might be a starting point (though I swore off ever again trying to do anything like that with […]


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: