Semantic Interoperability for e-Research in the Sciences, Arts and Humanities
I went along to a one-day workshop at Imperial College, London, on 2006 March 30, organised by the Imperial College Internet Institute.
The title was `Semantic Interoperability for e-Research in the Sciences, Arts and Humanities'; unfortunately it may have been too broad-ranging, with the talks either too high-level or too detailed, to be hugely useful for me, and indeed for us. The morning was mostly bioinformatics talks, and the afternoon mostly digital humanities, but it was the humanities themes that seemed to dominate (most unusually), in ways which probably aren't very useful to us.
I got the feeling, and I'm possibly being uncharitable here, that a couple of the talks were rehashed keynote speeches.
CIDOC CRM
The big theme of the day was the
CIDOC CRM (Conceptual Reference Model). This is an elaborate upper ontology (though it predates the current interest in ontologies as such) which seems very general in scope, but which was developed by, and serves the interests of, the cultural heritage community (ie, museums). Thus it has terms such as `participated in', `has created' and `took place at'. This has obvious utility for describing museum holdings, and for providing very rich views of those for education or research purposes. One of the few demos on the day was a very impressive one by Matthew Addis of
Sculpteur and its partner project
Artiste, which showed how you could add lots of value to a catalogue (in our terms, a registry) using the sort of reasoning which semantically rich markup makes possible. However we already know we can do that, and the demo didn't go into enough detail to usefully suggest to us what to do next.
Catton: big and little ontologies
Chris Catton, of the Zoology department in Oxford, gave a very interesting talk on sharing ontologies, and the demerits of big ontologies. He pointed out that big ontologies are hard to reason over, hard to visualise or comprehend, and hard to agree with, in the sense of building consensus around (don't we just know this...). Small is beautiful for a variety of reasons, including that small ontologies are subject to natural selection (I heartily agree with this, as it's more or less my hobbyhorse as regards the VO DM effort). The GO is very big, but with only two concepts it remains very simple in fact, and easy to reason reason over.
Although it sounds like a way to break through the problem, simply pointing into a large ontology – in the sense of using concepts within an ontology without importing the whole thing – doesn't work. This can go wrong if by doing this you ignore other semantics of the terms you're using (Catton's somewhat synthetic example was using an ontology' `author' concept without realising that it's a subproperty of `unreliableAuthor'. In this case you can end up implying things you didn't intend, or unwittingly create contradictions, which are only noticed when someone else does import the whole ontology.
There's some work on patterns for ontology creation, and Catton mentioned Guarino and Welty's
ontological normalisation; and Alan Rector's
implementation normalisation (having an asserted ontology consisting purely of disjoint subtrees).
Catton's solution is to have `Application-level ontologies' – you agree very small ontologies, and then specialise them on an application-by-application basis. Yes, it's obvious (he says), so why isn't it more common? FOAF and DC are the only real examples of this so far.
Demos and assorted insights
Doug Tudhope talked about
FACET, which does thesaurus-based query expansion. There was a very quick demo, based on trying to find `carver chairs' (a particular style of chair) from an initial query string of `victorian mahogany armchairs', but when I tried to reproduce it just now, nothing worked (I suspect it might be IE-specific).
Catton distinguished between `domain' vs. `operational' ontologies (I don't know if this distinction is his, or a standard one): it's easy to say that 'a publication has an author', but BibTeX, BioPax, Atom, ... all implement this in substantially different ways, which breaks interop. This feels like a very useful and/or important idea, though I'm not quite sure quite where it fits into the scheme of things.
Martin Doerr: Categorisation processes don't have to be completely automatic to be useful–Wikipedia shows that there's a lot of manual work that folk are willing to put in if the rewards are right, and semiautomatic processes could build on this. There were frequent mentions of folksonomies and social tagging.
Matthew Addis: `Google doesn't work well for long-tail content'. Because the PageRank algorithm is fundamentall based on citations from high-ranging pages, it privileges the most popular items of any type. On the contrary Amazon makes its millions from the `long tail' of items which very few people want. It has occurred to me that Google is the SW's dirty little secret–it really
shouldn't be that successful when all it's doing is keyword searching. However, if Google really does miss the long tail, it's
here that you need semantic searching.
--
NormanGray - 31 Mar 2006