Health information on wikis

My colleague Rob Pearce has a thought-provoking question about the safety of medical information on Wikipedia:

I’m not having a go at Wikipedia at all – I’m a big fan, but I had a thought about the recent McMaster University and Wikimedia Canada initiative to for health care content creation in the creative commons (holding workshops Oct 4th, 2011 at McMaster  introducing both professors and students to health care content creation in the  creative commons) .

This has come up before when I was working on Open Educational Resources: what is to stop nutters/malcontents – subtly or otherwise – altering medical information that leads to somebody putting their health in jeopardy or pushing one procedure over another, promoting one drug over another, etc.? I dont quite know how to defend this argument yet.

If you Google search for “shark cartilage”, “laetrile therapy” or “copper bracelet” you can see the nutters and profiteers already have their own web sites, which of course you and I are unable to edit.

For that reason I’m very glad Read the rest of this entry »

Getting information about UK HE from Wikipedia

At IWMW 2010, last week, a lot of discussion centred around how, in an increasingly austere atmosphere, we can make more use of free stuff. One category of free stuff is linked data. In particular, I was intrigued by Thom Bunting (UKOLN)‘s presentation about extracting information from Wikipedia. It has inspired me to start experimenting with data about UK universities.

Let’s get some terminology out of the way. Dbpedia is a service that extracts machine-readable data from Wikipedia articles. You can look at, for example, everything Dbpedia knows about the University of Bristol. SPARQL is an SQL-like language for querying triples: effectively, all the data is in a single table with three columns. SNORQL is a front-end to Dbpedia that allows you to enter SPARQL queries directly. It’s possible to ask SNORQL for “All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants” and get results in a variety of machine-readable formats.

Sadly, when you look for ways to use Dbpedia data, some of the links are broken, which was initially off-putting. SNORQL is great fun though. SPARQL is a something I’m only just learning, but to anyone familiar with SQL and the basics of RDF it’s straightforward.

List the members of the 1994 Group of universities

SELECT ?uni
WHERE {
?uni rdf:type <http://dbpedia.org/ontology/University> .
?uni skos:subject <http://dbpedia.org/resource/Category:1994_Group>
}
ORDER by ?uni
Read the rest of this entry »

Web page design from natural language

This is a cute online toy from James Wilkes: it constructs HTML+CSS pages from natural language commands such as “set div leftnav background-color to lightblue”. Not sure what the application would be – enabling paraplegic users?

The dark side of aggregating tags

An info-graphic on Flickr recounts the cautionary tale of the Conservative Party’s experiment in social media. They aggregated the #cashgordon tag, so that messages from Twitter with this tag would appear on their own site. The disaster that resulted was made possible by three technical errors:

  1. They didn’t filter content: anyone could use Twitter and the hashtag to write whatever text they wanted on the Conservative site.
  2. They didn’t filter out markup: users could style the content of messages how they wanted, e.g. 48 point high and they could embed images of their choice (including spoofs of the Conservative poster campaign).
  3. They didn’t filter out Javascript commands: users could insert a command redirecting the whole site to Labour, Rickroll or porn, which they promptly did.

Code-injection is something any developer should consider when building one of these services, and surely most do, but it’s nice to have a period reminder of what can go wrong when you miss out the necessary one or two lines of code.

Secrets of the Google Algorithm

Wired magazine has a feature article which gives about as much detail as outsiders can expect on the core of Google’s business, its search algorithm. I was surprised to see that philosopher Ludwig Wittgenstein was an influence. Hundreds of different pieces of information (or “signals”) are used to rank the results, and some of these are contextual to the user: for example, geographical information is used to prioritise results from near your location.

One of the signals which is increasing in importance is page speed: the time it takes the page to load and render. Hence it’s worth reading up on Google’s performance optimisation tips.

Yahoo Query Language

When I’m explaining the semantic web to people, I start by saying that I think of the present web as one big global document, made by linking together pages on different servers. Similarly, the semantic web would link data from many different servers to make a global database.

That vision just got a step closer with Yahoo’s YQL, a kind of super-API which allows you to perform SQL-like queries across data from multiple sites. The tutorial on Net-tuts uses the example of taking the latest tweets from a group of Twitter accounts. You could substitute RSS for Twitter to make a news aggregator (not a hugely imaginative application, but one on my mind recently).

More links:

Intute advent calendar blog

This December, Intute is once again running an “Advent Calendar” on its blog, with the theme of user-created content. It started on Tuesday with a post about the independent film Born of Hope, set in Tolkein’s Middle Earth. My own post, “Voluntary work for an obscure educational charity”, discusses contributing academic material to Wikipedia. Paul Meehan’s post today discusses augmenting a human-maintained web catalogue with Google Custom Search Engine. There’s more to come through the month on web2.0/community themes, and as usual the Intute blog has that bit more depth than the rest.

IWMW reflections/ Hug a Developer

This year’s Institutional Web Management Workshop 2009 was, as last year, a very friendly, useful, forward-looking conference. I suspect that some organisations didn’t send people this year because of the economic climate, which is a pity because the mindset of the conference was very focused on coping with future changes. There was, as last year, a lot of discussion of what the commercial sector can provide, and whether Google will conquer all. A phrase that got some use was “80/20 solutions”, i.e. 80% of the functionality at 20% of the effort.

For me the most interesting contribution was Prof. Derek Law’s opening keynote. He warned that the HE library sector may be too focused on responding to changing economic conditions, when the cultural changes happening now are arguably more significant. Read the rest of this entry »

Javascript rises to a whole new level

These don’t work in all browsers yet, and they work faster in Google Chrome than other browsers, but the Chrome Experiments show how far Javascript has come thanks to things like the canvas tag and powerful libraries. Witness a faithful recreation of the Amiga operating system and desktop, including the command line manager; a replication of the MilkDrop music visualisation plugin;  games; 3D effects and a version of cartoon physics.