The Ancient Geeks

Health information on wikis

September 9, 2011 — Martin Poulter

My colleague Rob Pearce has a thought-provoking question about the safety of medical information on Wikipedia:

I’m not having a go at Wikipedia at all – I’m a big fan, but I had a thought about the recent McMaster University and Wikimedia Canada initiative to for health care content creation in the creative commons (holding workshops Oct 4th, 2011 at McMaster introducing both professors and students to health care content creation in the creative commons) .

This has come up before when I was working on Open Educational Resources: what is to stop nutters/malcontents – subtly or otherwise – altering medical information that leads to somebody putting their health in jeopardy or pushing one procedure over another, promoting one drug over another, etc.? I dont quite know how to defend this argument yet.

If you Google search for “shark cartilage”, “laetrile therapy” or “copper bracelet” you can see the nutters and profiteers already have their own web sites, which of course you and I are unable to edit.

For that reason I’m very glad Read the rest of this entry »

Posted in Resource discovery, User experience. 2 Comments »

Getting information about UK HE from Wikipedia

July 20, 2010 — Martin Poulter

At IWMW 2010, last week, a lot of discussion centred around how, in an increasingly austere atmosphere, we can make more use of free stuff. One category of free stuff is linked data. In particular, I was intrigued by Thom Bunting (UKOLN)‘s presentation about extracting information from Wikipedia. It has inspired me to start experimenting with data about UK universities.

Let’s get some terminology out of the way. Dbpedia is a service that extracts machine-readable data from Wikipedia articles. You can look at, for example, everything Dbpedia knows about the University of Bristol. SPARQL is an SQL-like language for querying triples: effectively, all the data is in a single table with three columns. SNORQL is a front-end to Dbpedia that allows you to enter SPARQL queries directly. It’s possible to ask SNORQL for “All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants” and get results in a variety of machine-readable formats.

Sadly, when you look for ways to use Dbpedia data, some of the links are broken, which was initially off-putting. SNORQL is great fun though. SPARQL is a something I’m only just learning, but to anyone familiar with SQL and the basics of RDF it’s straightforward.

List the members of the 1994 Group of universities

SELECT ?uni WHERE { ?uni rdf:type <http://dbpedia.org/ontology/University> . ?uni skos:subject <http://dbpedia.org/resource/Category:1994_Group> } ORDER by ?uni Read the rest of this entry »

Posted in Databases, Semantic Web. Tags: iwmw10. 3 Comments »

Web page design from natural language

June 10, 2010 — Martin Poulter

This is a cute online toy from James Wilkes: it constructs HTML+CSS pages from natural language commands such as “set div leftnav background-color to lightblue”. Not sure what the application would be – enabling paraplegic users?

Posted in User experience. Leave a Comment »

The dark side of aggregating tags

March 22, 2010 — Martin Poulter

An info-graphic on Flickr recounts the cautionary tale of the Conservative Party’s experiment in social media. They aggregated the #cashgordon tag, so that messages from Twitter with this tag would appear on their own site. The disaster that resulted was made possible by three technical errors:

They didn’t filter content: anyone could use Twitter and the hashtag to write whatever text they wanted on the Conservative site.
They didn’t filter out markup: users could style the content of messages how they wanted, e.g. 48 point high and they could embed images of their choice (including spoofs of the Conservative poster campaign).
They didn’t filter out Javascript commands: users could insert a command redirecting the whole site to Labour, Rickroll or porn, which they promptly did.

Code-injection is something any developer should consider when building one of these services, and surely most do, but it’s nice to have a period reminder of what can go wrong when you miss out the necessary one or two lines of code.

Posted in JavaScript, Security, User experience. 1 Comment »

Secrets of the Google Algorithm

February 23, 2010 — Martin Poulter

Wired magazine has a feature article which gives about as much detail as outsiders can expect on the core of Google’s business, its search algorithm. I was surprised to see that philosopher Ludwig Wittgenstein was an influence. Hundreds of different pieces of information (or “signals”) are used to rank the results, and some of these are contextual to the user: for example, geographical information is used to prioritise results from near your location.

One of the signals which is increasing in importance is page speed: the time it takes the page to load and render. Hence it’s worth reading up on Google’s performance optimisation tips.

Posted in Google, Resource discovery, Search Engines, User experience. Leave a Comment »

Yahoo Query Language

January 26, 2010 — Martin Poulter

When I’m explaining the semantic web to people, I start by saying that I think of the present web as one big global document, made by linking together pages on different servers. Similarly, the semantic web would link data from many different servers to make a global database.

That vision just got a step closer with Yahoo’s YQL, a kind of super-API which allows you to perform SQL-like queries across data from multiple sites. The tutorial on Net-tuts uses the example of taking the latest tweets from a group of Twitter accounts. You could substitute RSS for Twitter to make a news aggregator (not a hugely imaginative application, but one on my mind recently).

Flickr licence SNAFU?

December 21, 2009 — Phil Barker

Update (7 Jan 2010) I heard yesterday from Flickr Support that the situation regarding incorrect licensing in the Flickr ATOM feeds described below has been resolved, and the correct licenses should now be displaying in those feeds. Thank you Flickr Support.

A few weeks ago I noticed something odd about the representation of creative commons licences in the ATOM feeds coming from Flickr. In my photostream on Flickr I have a picture of dry stone wall, licensed as CC-by-nc (Attribution-NonCommercial Creative Commons) and a schematic diagram of a FRBRized complex resource licensed as CC-by (Attribution Creative Commons). Looking at the ATOM feed that is available for my photo stream, we see

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:dc="http://purl.org/dc/elements/1.1/"  xmlns:flickr="urn:flickr:" xmlns:media="http://search.yahoo.com/mrss/">
<!--snip-->
	<entry>
		<title>Dry stone wall</title>
		<link rel="alternate" type="text/html" href="http://www.flickr.com/photos/philbarker/4015001528/"/>
		<id>tag:flickr.com,2005:/photo/4015001528</id>
<!--snip-->
		<author>
			<name>phil barker</name>
			<uri>http://www.flickr.com/people/philbarker/</uri>
		</author>
		<link rel="license" type="text/html" href="http://creativecommons.org/licenses/by-nd/2.0/deed.en_GB" />
<!--snip-->
        </entry>
<!--snip-->
	<entry>
		<title>FRBRized complex resource</title>
		<link rel="alternate" type="text/html" href="http://www.flickr.com/photos/philbarker/3877477899/"/>
		<id>tag:flickr.com,2005:/photo/3877477899</id>
		<author>
			<name>phil barker</name>
			<uri>http://www.flickr.com/people/philbarker/</uri>
		</author>
		<link rel="license" type="text/html" href="http://creativecommons.org/licenses/by-nc/2.0/deed.en_GB" />

<!--...-->

I’ve snipped out some of the irrelevant information, but the important bits are lines 14 and 26, which identify the creative commons licences under which the Dry stone wall and FRBRized complex resource images are released. According to this Dry stone wall is released as CC-by-nd (Attribution-NoDerivs; it should be CC-by-nc, i.e. Attribution-NonCommercial) and FRBRized complex resource is released as CC-by-nc (Attribution-NonCommercial; it should be CC-by, i.e. Attribution).

I’ve checked with colleagues and other photostreams and the miss-identification of CC licences in the ATOM feeds is quite general.

Is this just nit-picking? Well, no, it’s not. Any commercial service that gets content from Flickr and uses the Flickr ATOM feeds to work out which images they’re allowed to use is going to be using the wrong images as a result of this, possibly even using images illegally. They could be misled into believing that it was OK to use my picture of a Dry stone wall for commercial use so long as they don’t modify it.

I reported this to the Flickr support back in mid-October. Their front-line tech support didn’t seem to get what the problem was at first (not surprising, really, it’s not your usual how-can-I-copyright-my-photo type of query) but on explanation they recognized that it needed escalating. Since then I’ve heard nothing; and evidently the problem hasn’t been fixed. Until it is fixed, beware: the licensing info in Flickr ATOM feeds ain’t worth the XML it’s written in.

Posted in Feeds, Metadata, Web2.0. 2 Comments »

Intute advent calendar blog

December 3, 2009 — Martin Poulter

This December, Intute is once again running an “Advent Calendar” on its blog, with the theme of user-created content. It started on Tuesday with a post about the independent film Born of Hope, set in Tolkein’s Middle Earth. My own post, “Voluntary work for an obscure educational charity”, discusses contributing academic material to Wikipedia. Paul Meehan’s post today discusses augmenting a human-maintained web catalogue with Google Custom Search Engine. There’s more to come through the month on web2.0/community themes, and as usual the Intute blog has that bit more depth than the rest.

Posted in Google, Resource discovery, Search Engines, User experience. Leave a Comment »

IWMW reflections/ Hug a Developer

August 4, 2009 — Martin Poulter

This year’s Institutional Web Management Workshop 2009 was, as last year, a very friendly, useful, forward-looking conference. I suspect that some organisations didn’t send people this year because of the economic climate, which is a pity because the mindset of the conference was very focused on coping with future changes. There was, as last year, a lot of discussion of what the commercial sector can provide, and whether Google will conquer all. A phrase that got some use was “80/20 solutions”, i.e. 80% of the functionality at 20% of the effort.

For me the most interesting contribution was Prof. Derek Law’s opening keynote. He warned that the HE library sector may be too focused on responding to changing economic conditions, when the cultural changes happening now are arguably more significant. Read the rest of this entry »

Posted in eLearning, Management, Semantic Web. Tags: conference, iwmw2009. 2 Comments »

Javascript rises to a whole new level

April 30, 2009 — Martin Poulter

These don’t work in all browsers yet, and they work faster in Google Chrome than other browsers, but the Chrome Experiments show how far Javascript has come thanks to things like the canvas tag and powerful libraries. Witness a faithful recreation of the Amiga operating system and desktop, including the command line manager; a replication of the MilkDrop music visualisation plugin; games; 3D effects and a version of cartoon physics.

Posted in JavaScript, User experience. Leave a Comment »

« Older posts