Getting information about UK HE from Wikipedia

At IWMW 2010, last week, a lot of discussion centred around how, in an increasingly austere atmosphere, we can make more use of free stuff. One category of free stuff is linked data. In particular, I was intrigued by Thom Bunting (UKOLN)‘s presentation about extracting information from Wikipedia. It has inspired me to start experimenting with data about UK universities.

Let’s get some terminology out of the way. Dbpedia is a service that extracts machine-readable data from Wikipedia articles. You can look at, for example, everything Dbpedia knows about the University of Bristol. SPARQL is an SQL-like language for querying triples: effectively, all the data is in a single table with three columns. SNORQL is a front-end to Dbpedia that allows you to enter SPARQL queries directly. It’s possible to ask SNORQL for “All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants” and get results in a variety of machine-readable formats.

Sadly, when you look for ways to use Dbpedia data, some of the links are broken, which was initially off-putting. SNORQL is great fun though. SPARQL is a something I’m only just learning, but to anyone familiar with SQL and the basics of RDF it’s straightforward.

List the members of the 1994 Group of universities

SELECT ?uni
WHERE {
?uni rdf:type <http://dbpedia.org/ontology/University> .
?uni skos:subject <http://dbpedia.org/resource/Category:1994_Group>
}
ORDER by ?uni
Read the rest of this entry »

Advertisement

Yahoo Query Language

When I’m explaining the semantic web to people, I start by saying that I think of the present web as one big global document, made by linking together pages on different servers. Similarly, the semantic web would link data from many different servers to make a global database.

That vision just got a step closer with Yahoo’s YQL, a kind of super-API which allows you to perform SQL-like queries across data from multiple sites. The tutorial on Net-tuts uses the example of taking the latest tweets from a group of Twitter accounts. You could substitute RSS for Twitter to make a news aggregator (not a hugely imaginative application, but one on my mind recently).

More links:

IWMW reflections/ Hug a Developer

This year’s Institutional Web Management Workshop 2009 was, as last year, a very friendly, useful, forward-looking conference. I suspect that some organisations didn’t send people this year because of the economic climate, which is a pity because the mindset of the conference was very focused on coping with future changes. There was, as last year, a lot of discussion of what the commercial sector can provide, and whether Google will conquer all. A phrase that got some use was “80/20 solutions”, i.e. 80% of the functionality at 20% of the effort.

For me the most interesting contribution was Prof. Derek Law’s opening keynote. He warned that the HE library sector may be too focused on responding to changing economic conditions, when the cultural changes happening now are arguably more significant. Read the rest of this entry »

Calendar thoughts

Having been allocated the shared calendars subgroup of the Gateway Project I would appreciate thoughts on the shared-calendars project from my colleagues in the network. As you may have guessed, it was a surprise to both me and Rob that we were given overlapping projects but not put in touch with each other at the outset. I expect we will be able to clarify our combined approach at the Awayday in December.

So far I have been impressed at what we can get out of Mozilla Sunbird and how it will be a very useful tool to browse across all the network events. However, I am disappointed at the lack of enthusiasm for iCal from some corners. Anyhow, we have a small budget for enthusing our network contacts.

My view is that we should require all SC network sites to a) issue an iCal feed and b) redisplay their calendar using the same iCal feed on their site – Google Calendar is one solution that springs to mind, but if the same db that generates the iCal also generates a dynamic page of events then that should be quite sufficient. So far, we have Gateway project agreement to declare iCal as the standard feed.

CalDAV seems to be the next avenue to explore so if anyone knows of a suitable CalDAV we could publish to, then by all means post it on this thread. We do not need a CalDAV, but it would be useful.

XTech 2008, Dublin, Ireland – The Web on the Move

I attended the XTech conference at the beginning of May, and have just blogged my thoughts and notes on my own blog, so rather than reproduce those meanderings here, I will make like with the traditional web, and link to the posts. You can read these at:

A cracking conference, very engaging and stimulating, and from reading the posts on here about the Eduserv Symposium, it appears many themes cropped up in both conferences. I look forward to keeping an eye on this landscape to see what comes to pass, and what disappears along the way, and what gets bought by Microsoft.

Eduserv Symposium 2008

This event took place in London on 8th May and its theme was “What do current Web trends tell us about the future of ICT provision for learners and researchers?”

My colleague Ale Fernandez has already blogged at length about the symposium. I disagree with his downbeat assessment of the Guardian and BBC speakers, and also with his (poetically expressed) negative assessment of the live use of electronic discussion. I was also interested to read some reflections by Mary Burslem at Intute. Here are the points that stuck out in my mind from the event.
Read the rest of this entry »

1 May is RSS Awareness Day

Get involved

Edit by Martin: and for anyone who still doesn’t know what the fuss is about, watch this video from Commoncraft.com: “RSS in Plain English”.

Resource Sharing in Academic Support

Slides from a talk I gave this week about how an academic support “site” is increasingly a content provider using multiple external services, both commercial and academic (e.g. Google Video, SlideShare, Amazon, Intute).

I had been asked to talk about what will happen with resource sharing technology over the next five years, and there’s a little bit of technical stuff, but I thought the main point to get across was that resource sharing is central to what we do, and that we need to engage with the current environment of web services and embeddable content. This embedded presentation has active links:

The original presentation had ten slides, nine of which I used in the talk. Some explanatory text slides have been added. The lack of the “Web2.0” buzzword is deliberate.

Paul Downey on “Web APIs”

Via Danny Ayers via planetrdf, a very nice presentation from Paul, on the simple but massively under-appreciated theme: Web APIs Are Just Web Sites.

I’ve said the same about “Web Services” before. The SOAP and WS-* industry ignored what we already had — the Web — and shoehorned something alien into use instead. We can go a nice long way simply using the good old Web. Paul gives a short example but his example “protocol” users completely application-specific markup. A “weatherml” and some SIP/call markup.

This is the point at which both XLink and RDF people step in and say, “hey, what do these markups have in common? At least give us a cross-domain way of knowing which portion of each document is a hyperlink.” If Web APIs are Just Web Sites, you’d expect it to be easy to find the links between the pages, at least. Well, RDF people don’t shut up at that point (just as well, since you can figure it out by looking at an XML Schema, in theory at least). We then start banging on about cross-domain classes and properties, eg. if the weather markup wanted to talk about cities and locations, … or the call markup wanted to mention people … or the Atom feed wanted a bit of each of those, … why not just mix together domain-specific element names using some shared structural conventions? Which is exactly what RDF does.

How would this change Paul’s story? Well on the one hand, … the markup examples are less fragmented: you don’t have to understand an entirely new XML markup language for each application or domain. On the other, it ops us out of some usefulness from HTTP, since the granularity switches from document-typing to the level of individual properties and statements, meaning that saying things like “Accept: application/weatherml+ml” isn’t so easy to do, since the same bunch of markup might have bits of weatherml, bits of RSS/Atom, bits of Geo markup, bits of FOAF etc.

Perhaps we need some convention for sending HTTP Accept headers for application/rdf+xml where we can also optionally mention some specific RDF vocabularies, or indirectly mention a bundle of them to be used together (an ‘application profile’ in Dublin Core-speak). More on which maybe another time.