Getting information about UK HE from Wikipedia

At IWMW 2010, last week, a lot of discussion centred around how, in an increasingly austere atmosphere, we can make more use of free stuff. One category of free stuff is linked data. In particular, I was intrigued by Thom Bunting (UKOLN)‘s presentation about extracting information from Wikipedia. It has inspired me to start experimenting with data about UK universities.

Let’s get some terminology out of the way. Dbpedia is a service that extracts machine-readable data from Wikipedia articles. You can look at, for example, everything Dbpedia knows about the University of Bristol. SPARQL is an SQL-like language for querying triples: effectively, all the data is in a single table with three columns. SNORQL is a front-end to Dbpedia that allows you to enter SPARQL queries directly. It’s possible to ask SNORQL for “All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants” and get results in a variety of machine-readable formats.

Sadly, when you look for ways to use Dbpedia data, some of the links are broken, which was initially off-putting. SNORQL is great fun though. SPARQL is a something I’m only just learning, but to anyone familiar with SQL and the basics of RDF it’s straightforward.

List the members of the 1994 Group of universities

SELECT ?uni
WHERE {
?uni rdf:type <http://dbpedia.org/ontology/University> .
?uni skos:subject <http://dbpedia.org/resource/Category:1994_Group>
}
ORDER by ?uni
Read the rest of this entry »

Advertisement

Yahoo Query Language

When I’m explaining the semantic web to people, I start by saying that I think of the present web as one big global document, made by linking together pages on different servers. Similarly, the semantic web would link data from many different servers to make a global database.

That vision just got a step closer with Yahoo’s YQL, a kind of super-API which allows you to perform SQL-like queries across data from multiple sites. The tutorial on Net-tuts uses the example of taking the latest tweets from a group of Twitter accounts. You could substitute RSS for Twitter to make a news aggregator (not a hugely imaginative application, but one on my mind recently).

More links:

Designing for Big Data

This 20-minute talk by Jeff Veen, formerly of Google, is worth blogging not just for the reflections on user interaction with data, but a quick look at how far technology has come in the last 25 years. Show it to the non-ancient geeks and tell them what it was like!

Faster database searches with inverted index

Ancient Geeks contributor Tom Gidden has long obsessed about making database text searches fast and scalable. Your RDBMS may already have full text searching, but that can’t necessarily cope with serious load.

Tom has described his version of an Inverted Index technique in an article in php|Architect magazine, but now he and a colleague have implemented the idea with MySQL stored procedures rather than PHP, and have released an open source project through Google Code. I’ll likely be using this on one of my work projects.