Getting information about UK HE from Wikipedia

At IWMW 2010, last week, a lot of discussion centred around how, in an increasingly austere atmosphere, we can make more use of free stuff. One category of free stuff is linked data. In particular, I was intrigued by Thom Bunting (UKOLN)‘s presentation about extracting information from Wikipedia. It has inspired me to start experimenting with data about UK universities.

Let’s get some terminology out of the way. Dbpedia is a service that extracts machine-readable data from Wikipedia articles. You can look at, for example, everything Dbpedia knows about the University of Bristol. SPARQL is an SQL-like language for querying triples: effectively, all the data is in a single table with three columns. SNORQL is a front-end to Dbpedia that allows you to enter SPARQL queries directly. It’s possible to ask SNORQL for “All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants” and get results in a variety of machine-readable formats.

Sadly, when you look for ways to use Dbpedia data, some of the links are broken, which was initially off-putting. SNORQL is great fun though. SPARQL is a something I’m only just learning, but to anyone familiar with SQL and the basics of RDF it’s straightforward.

List the members of the 1994 Group of universities

SELECT ?uni
WHERE {
?uni rdf:type <http://dbpedia.org/ontology/University> .
?uni skos:subject <http://dbpedia.org/resource/Category:1994_Group>
}
ORDER by ?uni

Results

Get the Longitude and Latitude of the University of York

SELECT ?lat, ?long
WHERE {
:University_of_York geo:lat ?lat .
:University_of_York geo:long ?long
}

Results

List universities in the United Kingdom, with their cities, types, web sites, and numbers of Undergraduate and Postgraduate students

SELECT DISTINCT ?uni, ?city, ?type, ?ug, ?pg, ?web
WHERE {
?uni rdf:type <http://dbpedia.org/ontology/University> .
?uni dbpedia2:country ?uk .
?uni dbpedia2:city ?city .
?uni dbpedia-owl:numberOfPostgraduateStudents ?pg .
?uni dbpedia-owl:numberOfUndergraduateStudents ?ug .
OPTIONAL { ?uni dbpedia2:type ?type } .
OPTIONAL { ?uni dbpedia2:website ?web }
Filter (?uk = :United_Kingdom || ?uk = :England ||?uk = :Wales ||?uk = :Scotland || ?uk= :Northern_Ireland)
}
ORDER by ?uni

Note that in this implementation, “:” is an abbreviation for “http://dbpedia.org/resource/&#8221;, so “:United_Kingdom” is just a shorter way of saying “http://dbpedia.org/resource/United_Kingdom&#8221;
Results

The data in these examples is sometimes patchy, as you would expect. Glasgow presently appears twice in the list because it is listed as both a “public university” and an “ancient university”. The latter query could do with some tidy up. The HESA data on which the student and staff numbers is based is often a few years old rather than up to date. Web sites URLs are formatted in different ways in different infoboxes, leading to a slight inconsistency (which could be fixed by an extra line of code). Then again, given that it’s drawn from Wikipedia, I’m impressed at the completeness (and of course it’s easy to correct or update the figures).

Chains of doctoral advisors featuring four scientists

SELECT ?a, ?a_birth, ?b, ?b_birth, ?c, ?c_birth, ?d, ?d_birth {
?a rdf:type <http://dbpedia.org/ontology/Scientist> .
?b rdf:type <http://dbpedia.org/ontology/Scientist> .
?c rdf:type <http://dbpedia.org/ontology/Scientist> .
?d rdf:type <http://dbpedia.org/ontology/Scientist> .
?a dbpedia-owl:birthDate ?a_birth .
?b dbpedia-owl:birthDate ?b_birth .
?c dbpedia-owl:birthDate ?c_birth .
?d dbpedia-owl:birthDate ?d_birth .
?d dbpedia-owl:doctoralAdvisor ?c .
?c dbpedia-owl:doctoralAdvisor ?b .
?b dbpedia-owl:doctoralAdvisor ?a
}
ORDER BY ?a_birth

Results
Lots of potential here for tracking the impact of individual academics and institutions.

Advertisement

3 Responses to “Getting information about UK HE from Wikipedia”

  1. DBPedia and the Relationships Between Technical Articles « UK Web Focus Says:

    […] example of the potential for DBpedia has been described by Martin Poulter in  a post on Getting information about UK HE from Wikipedia which explores some of the ideas I discussed on  A Challenge To Linked Data Developers. But […]

  2. Consuming and producing linked data in a content management system | JISC IE Technical Foundations Says:

    […] filtered datasets retrieved from SPARQL queries on DBpedia (as illustrated by Martin Poulter in his follow-up blog post ‘Getting information about UK HE from Wikipedia‘) […]

  3. Linked Data for Events: the IWMW Case Study « UK Web Focus Says:

    […] a post entitled “Getting information about UK HE from Wikipedia” published in July on the Ancient Geek’s blog Martin Poulter commented that “At […]


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: