European Semantic Web Conference 2006

Budva, Montenegro
June 11-15 2006

Introduction

These are notes from the sessions I attended at ESWC'06 last week. For more detailed (and accurate smile ) information, check out the following:

I (Norman) have also interleaved some notes, though only for some talks which seem slighly more generally useful.

W5: S4: Querying

Xcerpt

  • Rule based query language
  • XML vs RDF data
    • RDF inherently graph structured
  • Augment data terms by edges
  • Vs Sparql
    • More breadth
  • Allows ordering
  • Versatile access
  • Query construct using conjuncts from XML & RDF simultaneously
    • Use xml query to find terms on which RDF query is based
      • All in one query statement

XAMaXoS

  • Need efficient implementation of Xcerpt
  • Abstract Machine
    • Hw virtualisation
    • Wine
    • JVM, CLI
  • Instruction set + machine model
    • cF algebra = operators + data model
  • Precise query semantics
  • ==> optimizability
  • Aims
    • Language neutral
      • But bias to Excerpt
    • Focus on in-memory processing of distrib data
      • Initially
      • Ad-hoc index creation
    • Distributed evaluation
  • Split into base relations -> conjunctive queries
    • Optimization: move 'tough' decisions to compile-time
  • Data model
    • Basic type: node with properties
    • Memory model: memoization matrix
  • Compute spanning tree from graph
    • Use tree to compute matrix
  • Three phase algorithm
    • Matrix population
    • Expansion of non-tree joins
    • Matrix consumption
  • First version end of the year

XMore precise typing rules for Xcerpt

  • Descriptive typing for Xcerpt
  • Type inference
  • Type checking
  • => locating errors in programs
  • Formalisms
    • Data terms
    • Type definitions

W5: S5: Reasoning II

Extending OWL web node with Reactive behaviour

  • Triggers
  • ON event WHEN rule DO action
  • Events on OWL level can be derived
    • No distinction between base & derived relations
  • ON DELETE OF hasHusband DO
    • DELETE hasWife
  • If (dept, hasProfessor, …) rule added
    • ON INSERTION OF hasEmployee OF department
      • RAISE EVENT (new_employee(…))
      • => reacts on change in model
  • Pre-reasoning triggers
  • Post-reasoning triggers
    • React on changes to model
  • See example (CREATE TRIGGER test …)
  • Architecture
    • Web service
    • Jena based module with active functionality
    • PostgreSQL db + RDF facts
    • DIG tel&ask i/f
  • Algorithm
    • Verify direct triggers: rollback if not consistent
  • Jena RDF-framework
  • See alternatives
  • Conclusion
    • Simple solution
    • Ready-to-use

Open & Closed-World reasoning

  • Desirable to apply closed world reasoning to subset of knowledge
    • Where knowledge complete
    • E.g. list of EU countries
  • Extended logic programming

Reasoning with Temporal constraints

  • Temporal RDF
    • TRDF + intervals
    • TRDF + temporal constraints
  • Create entailment of intervals

T5: Application development using Sesame

Introduction

Tutorial

Sesame framework

  • Sesame
    • For storage, querying and inferencing
    • Java library
    • Repository
    • Reasoning support
    • Custom rule engine (coming)
  • Various backends
  • Rio toolkit
  • Architecture (see slide)
    • Http server (superset of SQARQL)
    • Repository access api
    • SeRQL , SPARQL (declarative querying)
    • SAIL api (storage and inference)
    • Rio (RDF io)
  • Application either direct to rep api or via Http
  • SAIL api
    • Abstraction from physical
    • System level api (not dev level)
  • Repository access api (dev level)
  • Installation
    • Java 5.0 + Tomcat 5.x
    • Deploy sesame.war
    • Configure server
      • Edit config file
  • Reasoning

Sesame includes an HTTP server, which allows you to add data to a knowledgebase through the web and (I'm pretty sure) query the result. Sesame will support 10^7 triples on desktop hardware (I think this means that you can load that many, and query the result, but not necessarily inference with them, in the sense of query the implied triples). Sesame's reasoning support includes RDFS and OWLIM (see below), and it allows you to plug in your own (domain-specific) custom rule engine, presuming you have one handy.

Differences to Jena: Sesame started out with a focus on declarative querying and added the API (I imagine this is somewhat simplified), whereas Jena started out as an RDF API and added querying later. They're converging, though the Sesame download is about 1MB against Jena's 8MB, which the Sesame folk are rather pleased about. Relative performance depends heavily on the case. [NG]

Querying

  • http://www.openrdf.org/
  • SeRQL vs SPARQL
    • Similar
    • SeRQL (pronounced `circle')
      • Nested queries (IN, EXISTS operators)
      • Efficient Sesame impl
    • SPARQL
      • W3C std * Tool interop: Jena, Redland, 3Store, Sesame
    • See examples slide
    • Can they be transformed
      • Most queries
      • Internally parsing to same query object model
  • Path expression: {X} P {Y}
    • Chaining
    • Branching
    • Comparison operators: string, boolean
  • Query composition
    • Overlay query graph onto actual
  • Optional path expressions
    • RDF is semi-structured
    • […]: if matched then return that data
  • CONSTRUCT queries
    • Return RDF statements
  • Graph transformations
    • E.g. playsInMovies
    • Create new graph
    • Can use api to feed back into repository
  • Nested query
    • SeRQL has IN, ANY & ALL, EXISTS
    • See examples
    • Speed
      • Can apply optimization
      • Rules of thumb
      • SAIL allows 'prepare' to test for optimization * For some SAILs ¨ Equality for RDBMS backend
  • SPARQL
    • Allows functions to be defined (sorts of properties)
  • Query returns graph
    • So can populate new in-memory store

SeRQL allows nested queries, for example all movies which don't have a rating: that is, all movies where a query for the rating fails. Or find the highest rating for each user: that is, the film for which the rating is bigger than any other match for a given user (the implication was that this is difficult or impossible in SPARQL). The two languages don't look massively different on the page; SPARQL possibly looks a bit more SQL-like. [NG]

Using the Sesame API

  • Using Sesame as library
    • E.g. (see slide) create repository object
    • Querying
      • Execute query and iterate over results
  • Transactions
    • Commit & Rollback
  • Context support
    • May come from separate RDF files
    • => provenance tracking, versioning & time-tracking
  • Default vs named context
  • Querying: FROM CONTEXT

Elmo

  • http://www.workbrain.com/
  • Maps sesame repository to java beans
  • Uses query expansion & caching
    • Query ahead
  • AugurRepository method
    • Can be used outside of Elmo
  • See Architecture on slide
  • Other tools
    • Scutter: crawls distrib RDF networks
    • Smusher: find duplicates

OWLIM

  • http://www.ontotext.com/owlim
  • Supports RDF(S) & limited OWL-Lite
  • Semantics
    • Reasoning customizable
      • : empty, RDFS, owl-horst, owl-max
  • OWL support
    • Almost all primitives
    • Close to OWL-Lite
  • RDFS support
  • In-memory reasoning & Reliable persistence
  • Forward-reasoning avoids need for query optimization
  • Configurable SAIL for Sesame
  • Almost ready with Sesame 2.0
  • OWLIM & Sesame serve WSML
  • WSMO infrastructure
  • Loads LUBM(50,0) in 15 mins
    • Only other one able to load it takes 12 hours
  • BigOWLIM
    • Passed LUBM(8000,0)
      • 1.06 Billions explicit statements
      • 69 hours to load
  • (The original OWLIM is now referred to as SwiftOWLIM ) [NG]
  • Only need to use Sesame
    • And use OWLIM as SAIL
    • Can also go via Elmo
  • OWLIM is open source
    • But TRREE is not
      • Free for in-memory version
      • ¤1000 per cpu for big version

OWLIM is described as a `scalable semantic repository', using a `pragmatic subset of OWL'. In ascending order of complexity/expressivity, the various languages are:

  1. RDFS
  2. OWL DLP (Description Logic Programming: the intersection of OWL and Prolog, more or less)
  3. OWLIM (includes some overlap with OWL Lite)
  4. OWL Lite (all the easy bits of OWL)
  5. OWL DL
  6. SWRL (OWL plus some rules; heading towards standardisation)
  7. OWL Full
Most scalable RDF repositories tend, it seems, to be somewhere between RDFS and OWL Lite.

OWLIM supports almost all the OWL primitives, except from Thing, Nothing, differentFrom, and complementOf, though some primitives are only partially supported. However it doesn't have the DL-specific constraints, so that meta-classes (classes of classes), and properties linking classes to instances, are OK. [NG]

Tutorial: Semantic web policies

[NG] This was a half-day tutorial, though Piero Bonatti also gave a talk on the issues later in the meeting.

Part of the REWERSE network, which is concerned with Rules on the Web

Policies aren't just about security but include business rules, workflow management and so on. It's risky and costly to encode such policies in code, and then to change them.

At present, current technologies include XACML (built around 'rules', but very simple, without chaining) and P3P (rudimentary ontology, ill-specified)

Open systems authorisation

  • Example: getting wireless service in an airport lounge via a frequent flier card, pre-pay card, credit card, airline employee status
  • Illustrates privacy issues. Why should you disclose sensitive information to this server?
  • Self-regulation: SEAL program (TRUSTe, BBBOnLine , WebTrust ), following best practices, subject to audit

Expressiveness requirements

  • 'Policy' means secutiry, business rules, QoS , etc, but all make decisions based on attributes, such as age, possession of ID, and so on
  • Policies can be active as well as passive: for example triggering manual registration procedures, or logging
  • Evidence: Strong (easy to reason about: ID, credit cards, subscriptions), soft (inc PGP, possibly strong but hard to reason about web of trust, eBay reputations), lightweight.

They talk of `co-operative policy enforcement', by which they mean never just saying no, but instead explain why not (you don't have a valid card), and what the user might do about it (ask X for an account), and supporting what-if and query scenarios.

Requirements

  • Well-defined semantics. No surprises: conclusions should be the same for any reasoner
  • Monotonic: no decrease in access on disclosure ('grant access if requester is not a student'). This is because you can't reliably find all the properties a requester has, so making deductions based on the absence of properties is unreliable.
    • Some exceptions: might check with VISA for the absence of a revocation
    • Doesn't apply to time-based policies: might have access before 5, but not after
  • Delegation is necessary in many cases, both for privacy and efficiency reasons (the predicate has a valid VISA card is evaluated by VISA, not by the policy owner). In this context, loops are a problem, but aren't errors
  • Policies can be sensitive: 'access only Sun&Microsoft' (so they're cooperating), 'records available only to psychiastrist/parole officer', 'my pictures only to my friends' (but I can't see them, so I'm not a friend!)
  • Must be able to reason in situations involving loops. Alice will let her friends see her photos, and decides that her friends include Bob's friends; Bob decides that his friends include Alice's friends; hence a loop. There's another standard scenario: two CIA agents meet, and each one says I'll show you my CIA credentials if you show me yours first.
  • Usability is hard: `too often only the PhD student who designed a policy language can use it effectively' [Seaham, ESWC'02?]
  • Conflict resolution
    • different policies may apply
    • one policy permits, another denies
    • one policy obliges, another denies
  • Might need a proof that a policy has been satisfied, to pass on to a third party
  • Many policy languages lack (non-prototype) implementation
  • And then you need tools

Current policy languages

  • Languages are well-defined/formal semantics; centralised/distributed evaluation
  • XACML: distributed policies, but centralised evaluation, no formal semantics
    • procedural semantics defined in Haskell, and incompletely, so that extensions can be made in an ad hoc way (XACML is at v2.0, but there's only a draft semantics document for v1.0)
    • not declarative
    • No negation, so not monotonic
    • no variables, so not rules in the rules-community sense, and no chaining
    • no delegation
    • all policies are public (since they're evaluated centrally)
    • conflict resolution, with deny/permit overrides
    • there are multiple implementations and tools
  • P3P : platform for privacy preferences
    • schema, not a language
    • informational, and doesn't enforce compliance
    • policies may be ambiguous
  • Kaos
    • policies brought to central point for evaluation
    • uses OWL ontologies
    • 4 types of policy: positive/negative authorization/obligation
    • Uses DL subsumption reasoning to reason over policies: check for applicability of policies
    • Provides a static conflict resolution algorithm, using policy precedences, and punts to user if the algorithm fails
    • Well-defined semantics; no negation, so no monotonicity
  • PeerTrust
    • guarded distributed logic; distributed/delegated evaluation
    • policy protection, so negotiation
    • has an implementation in a jar file(works, but not production quality)
    • there are tools; policies can be specified using Protégé
  • Protune
    • general provisional-style actions: actions are performed (and are true if they succeed?)
    • big language, by the look of it, with bells, whistles and gongs
    • supports negotiation
    • implementation ongoing, built on PeerTrust ; there are a few tools

W7: S1: Semantic wiki

GraphingWiki

  • http://graphingwiki.virtues.fi/
  • Background in computer security
    • Black box testing for sw vulnerabilities
    • Traditional protocol views (see slide)
      • Early viz attempts
      • Trying to understand linkages
  • Protocol data stored in MoinMoin
    • Then extract into views
    • Began with extra markup
      • Commonalities with RDF and semantic wikis
      • Exported to RDF

Reusing Ontological Background Knowledge

  • http://wiki.ontoworld.org
  • MediaWiki
    • PHP/MySQL
      • Not many SemWeb tools for this
  • Semantic MediaWiki (SMW)
  • Installations
    • Ontoworld
    • WWW2006
    • ESWC2006
    • Bible wiki
    • Esoteric knowledge wiki
  • Mapping of OWL to SMW
    • But how to use?
  • Ontology import
    • Upload using mapping
    • Kickstart wiki
    • Can enrich an existing wiki
    • Only works for simple parts: mapped stuff
  • OWL more expressive
    • Subproperties
    • Inverse, funcitonal, transitive props
    • Number constraints
    • Class constructors: negation, conjunction, disjunction
    • But how to add to wiki?
      • Extend wiki syntax
      • But usable?
  • KISS
    • But can we keep the sophisticated knowledge
  • Architecture
    • KAON2 loaded with knowledge
    • SMW
    • Edit on SMW, check for consistency on KAON2
      • Warn user if not
  • So, use background knowledge to check SMW consistency
  • Can also categorise page from statements
    • ==> inferred categories
  • Map existing URIs
  • More powerful queries
    • Bg knowledge & reasoner
    • Query interface? SPARQL?

Kaukolu

  • http://kaukoluwiki.opendfki.de/cgi-bin/trac.cgi
  • Corporate memory via intranet wiki
    • How to map intranet information to RDF?
    • Wiki pages do not represent ontological resources
  • Want to construct RDF fata that complies with existing ontologies
    • Shallow vs deep ontologies
  • Kaukolu
    • Based on JSPWiki
    • Sesame 2 as RDF repository & inference engine
  • Supports importing ontologies
    • RDFS ontologies
    • RDF data
  • Ontology-drive autocompletion in editor
  • Inport RDFS, author RDF, reuse info in other apps
  • Future
    • UI improvements
    • Text-to-RDF wizard
      • Have NLP module already
    • Customized plugins
      • Essential for rendering and entering RDF
  • Discuss
    • Is mapping RDF resources to wiki pages way to go?
    • Are RDF triples sufficient for 'everyday knowledge'?
    • Shouldn't real basis of wiki semantics be a foundational ontology instead of basic RDF triples?

W7: S2: Lightning pane

Semantic wiki engines

  • Makna
    • http://makna.ag-nbi.de
    • Create & manage semantic info using wikis
    • Collab ontology engineering
    • Main features (see slide)
      • JspWiki * Semantic additions * Jena
      • User + Admin
    • Implementation
      • Semantic content authoring * Extended wiki syntax
      • Context-based presentation/navigation
      • Content- & structure-based retrieval
    • Future
      • Ontology engineering support
      • Multimedia extension
      • Evaluate usability
  • Annotation, Representation and Navigation
    • Common theme?
    • SemperWiki
    • Annotation
      • Dimensions: Attribution, granularity, representation distinction, terminology reuse, object type, context (provenance & scope)
    • Representation
      • Annotations -> both docs & concepts
      • Allow annotation of both
    • Navigation
  • SweetWiki

Future of Semantic Wikis

  • iMapping Wikis
  • ABCDE format
    • Semantic conf proceedings
    • Think in stories
    • Take Latex and add new lines
      • -> metadata
  • Learning with semantic wikis
    • http://ikewiki.salzburgresearch.at/
    • Recursive self-referential process
    • Self-directed learning
      • Challenge reader to contribute
    • Educational env
      • Collaborative features * Story writing
    • Wikis as ePortfolios
    • Benefits of SWs
      • Annotation allow reflection
      • Share models between learners
      • Reasoning
      • Reusability

From Wikipedia to Ontology

  • Harvesting wiki consensus
    • URIs in wikipedia identify ontology concepts
    • Ontology tools & languages barrier to users
    • Wikis
      • Collab ontology creation
      • Use of multimedia elements * Richness of informal concept definitions
    • Results
      • URIs v reliable
  • From wikipedia to semantic relationships
    • Semi-automated annotation approach
    • Aim
      • Identify relations in free text
    • Resources
      • No training
      • Extract relations automatically * Minimal manual intervantion
      • Use NLP
    • Use wikipedia
      • And annotate it
    • Method
      • Extract pairs in NL
      • Extract patterns
      • Apply patterns
      • Produce wikipedia list pages
    • More than 23000 related pairs for 20000 wikipedia pages
    • Good precision on some pages
  • Extracting semantic relationships
    • Q: how to do complex structured queries on wikipedia
      • E.g. find countries which had non-violent revolutions
    • Connectivity ratio
      • Correlation with semantic connection strength * Inset better than outset
    • COUNTRY too broad a category
    • Future
      • Test more factors

From semantics to wikis

W3: Semantic Network Analysis (SNA)

Representing Social & Cognitive Networks

  • Graph is basic entity
    • SNA, RCA (Relational content analysis)
  • Content analysis
    • Social science discipline
    • Who influences whom
  • Relational content analysis
    • Extract relationships between actors, issues, values, facts from texts
      • Issue position of actors
      • Causal relationships
  • RCAs challenges for SW
    • Actors are nodes in network
    • Complex relations: n-ary relations
      • Not binary
      • Vectors of real values
      • Ambiguity
    • Meta-info about documents with triplets
  • Example 1: Islam issue
    • Info
      • Document sets: web sites, newspapers
      • Doc info * Medium, contributor, date
      • Object info * Simple object ontology
    • Data analysis
    • Who influences whom
      • Stats modelling
    • ==> Action/reaction feedback
  • Example 2: Pim Fortuyn
    • Increases his standing after negative comments about immigrants
  • Towards SW-RDF solutions
    • RDFS
      • Object info
      • Predicate info * Subtype scheme of limited use
    • RDF reification
      • Of little use to add metadata to triplets * Ambiguous
      • Poorly implemented
    • Dummy nodes ==> n-ary predicates
      • Also ambiguous
      • Redundant to have > 2-ary
    • Named Graphs
      • Express explicitly whether RDF enrichment deals with doc info, triplet info or predicate info
  • SW solution: RDFS + Named Graphs

From Semantic to Social

  • How to introduce human and social perspectives into KM
    • Exploit info from users
  • Integrated process
    • Heterogenous data
    • Generate exploitable data from rough data
    • Can manipulate great volume
    • Used graphs
  • See slide: process based on unified graph
    • Text mining
    • Graph mining
    • SNA
  • Steps
    • Docs & users interconnected
    • Links
    • File analysis
    • Semantic propagation
    • Profile similarity
  • #1: File Analysis
    • Colocation matrix (HAL-like technics)
    • Link words with weights
    • Graph clustering
    • => clusters with ordered words
  • #2: Semantic Propagation
    • Select most frequent concepts
    • Normalize
    • Apply to docs & users
    • => graph connecting users, docs & concepts
  • #3: Profile Similarity
    • Concept graph
    • Compute similar docs & k-nearest neighbours
    • => browsable graph structure
      • Communities of docs and people
  • Unsupervised & integrated process
    • => enhanced KM

Exploring Social-Topic networks

  • Using Author-Topic model
  • Allow researcher to explore community
    • How are peoples organised
  • Support for scientific community
  • ?Information needs
  • Motivation:
    • DBLP: topic links
    • Flink: people links
  • Goal: social networks with topic communities
  • Topic extraction + community identification
  • Unsupervised techniques
  • Process
    • Corpus as bag-of-words with known authors
    • => learned topic model
      • People + topic distribution
      • Topic + keyword distribution
  • Topic similarity
  • See prototype slide
    • Identify topic communities

Measuring Semantic Centrality

  • Semantic social network (SSN)
    • People (or actors)
    • Personal ontologies
    • Concepts (or classes)
  • 3 layered arch (see slides)
    • Social
    • Ontology
    • Concept
  • Centrality
    • What does this mean?
    • On SSN, who has most powerful interop between heterogenous users
    • Measures
  • Semantic Centrality
    • Power of structural position on social network
  • Use: find shortest path between 2 users for them to communicate
    • Can predict who can help whom
  • Consensual ontology
    • Extract most freq and common classes
      • Substructure mining
  • Two kinds of semantic centrality
    • Local
      • Within same subgroup
    • Global
      • Bridging power between subgroups

Emergent social networks

  • Multi-layered model to cluster users' preferences & find semantic relations between them
  • Apps: Group Profiles, Recomender systems
  • Ontology based user profiles
  • Emergent SSNs
    • #1: semantic preference extension
      • Based on Constrained Spreading Activation (CSA)
      • Propagate user pref weights through ontology concepts
    • #2: semantic concept clustering
      • Classical hierarchical clustering strategy
      • Find groups of prefs shared by users
    • #3: semantic user clustering
      • Assign users to concept clusters

Topic communities in P2P networks

  • http://www.aifb.uni-karlsruhe.de/Projekte/viewProjektenglish?id_db=30
  • SWAP project & TAGORA project
  • Opposite challenges
    • Analysis
      • What is happening in network
    • Construction
      • Nodes/agents: 'peers'
      • (see slide)
  • Use cases
    • Bibster network
      • Bibliography
    • Virtual organization
      • Balearic Islands tourism
  • Basic idea: Shortcut Creation
    • Based on SN Metaphors
    • Query-dependent vs Query-independent
    • Ask question:
      • Content provider (has answered question in past)
      • Recommender (has asked question in past)
      • Bootstrapping network (has good links)
    • =>
      • Content shortcut
      • Recommender shortcut
      • Bootstrapping shortcut
  • INGA motivation: Social Expert Network
  • Implement in p2p network
    • So query network
  • So, Build content shortcut index
    • #1: Send query using most promising layer of semantic overlay topology
    • #2: Evaluate result of query
    • #3: Update shortcut index
  • Active vs Passive
    • Active: based on last but one person in query answer
    • Passive: listen to incoming queries
      • Register person interested in a topic so that he asks question
  • Simulation environment
    • Used ODP
  • Semantic Similarity leads to strong clustering
    • But does not give good rates of recall
  • Revisiting construction
    • Peers
    • Query forwarding
    • LRU
    • Interest-based locality
    • Index update
  • Success criteria
    • Effectiveness: recall
    • Efficiency: nr messages
    • Robustness: reaction to change
  • Toolset?
    • Construct SN with help from SN Analysis

Keynote: Frank van Harmelen

This was a rather good keynote, on why the Semantic Web isn't just plain old Computer Science. The slides are at http://www.eswc2006.org/keynote-frank-van-harmelen.pdf

There was quite a lot of stuff in it, but the main point was that there are at least four basic assumptions of traditional computer science that aren't true, or at least are interestingly more subtle, on the Semantic Web. These are:

  • Traditional complexity measures are poorly applicable. Part of the staple diet of first-year computer scientists is the analysis of complexity measures of algorithms, identifying linear, polynomial and exponential algorithms, and running in terror from the latter. But these are generally worst-case measures, and if the exptime case is in practice exponentially unlikely, then this bad behaviour doesn't matter almost all of the time.
  • Some things are hard in theory but easy in practice. For example, reasoning with inconsistent ontologies is both important (reasoning with defaults: the statements birds fly, penguins are birds and penguins don't fly are formally inconsistent, in the sense that the statements penguins fly and not (penguins fly) can both be validly deduced) and terribly complicated (see `defeasible logics', `non-monotonic logic', and whole sessions at meetings such as these). But very simple approaches to this problem -- he mentioned an algorithm consisting of simply adding statements until you find one conclusion or the other -- though they have little formal support, can actually work rather well.
  • Context-specific reasoning is important (this includes semantic search, I suppose, though I'm doubtful about how necessary semantic search really is in general)
  • And fuzzy logic, or logic with statistics, is much important than it is in general CS.

Semantic Annotations #1

DEMO

  • http://omv.ontoware.org
  • Annotate ontologies => reuse
    • OMV: Ontology Metadata Vocabulary
      • Core + extensions
    • Tools
  • Current
    • Lots of ontologies
    • Methodologies
    • Tools
  • Approach
    • Methods & tools for:
      • Ontology sharing, discovery & usability
  • DEMO
    • Design environment for metadata ontology
    • Create framework
    • Objectives
      • Organization
      • Develop and promote core & extensions
      • Tech infrastructure
  • Components
    • Engineering
    • Evolution
    • Extensions
    • Applications
  • OMV
    • Metadata schema
      • Incl dublin core
    • Core + extensions
    • Designed as ontology
      • More controlled description
    • XML + OWL Lite
  • Omv Core
    • Conceptualisation
    • Implementation
  • Concepts
    • See slide
  • Extensions
    • Evaluation
    • Alignment
  • Tools
    • OYSTER
      • Share ontologies
    • Onthology
      • Repository
    • OntoMeta
      • Metadata generation

seMouse

  • Motivation
    • File mgt system has not kept pace with capacity of hdd
    • Classification capabilities
      • E.g. paper: Title, authors, year, conference
    • Automatic metadata extraction based on file format
      • File-centric vs User-centric view
  • seMouse features
    • Ontology aligned
    • Doc format & editor independent
    • Interface
      • Menu based, context-based
  • Operations
    • Load ontology
    • Classification
    • Annotation
    • Doc relationship
    • Authoring
    • Browsing
  • Ontology loading
    • Repository
    • File
  • Classification
  • Annotation
    • Select part of doc & apply annotation
  • Doc relationships
  • Authoring
  • Browsing
  • Current
    • Integration of seMouse with semantic desktop ontology (Gnowsis)
    • ? Ontology creation on the fly

Annotated RDF: aRDF

Semantic Annotations #2

An Environment for Semi-Automatic Annotation of Ontological Knowledge with Linguistic Content

  • OntoLing
    • Support linguistic enrichment
    • Plugin for Protégé
      • -> linguistic KB explorer
    • Access to linguistic resources (LRs): WordNet , FreeLang , Dict
  • Linguistic Watermark
    • LR access
    • Offers classification of diff LRs
    • Provides api for accessing content
  • Scenarios
    • Explicit ling enrichment
    • Produce multilingual ontologies
    • LexicoSemantic enrichment of onts
      • Sim to controlled vocab
  • Automatize LexicoSemantic enrichment of ontologies
    • Identify pointers (lexico-semantic anchors) from ontological objects to semantic indexes of a LR
  • Experimental results
    • Good precision and reasonable recall

Managing Information Quality in e-Science

  • http://www.qurator.org
  • Information and quality in e-Science
    • Reqmt on scientists to place data in public domain
    • But have to deicde if data is okay
      • Variations in quality of data
      • No control over quality
      • No stds for measuring quality
  • Scenario: qualitative proteomics
  • Quality is personal
  • Reqmts for IQ ontology
    • Establish common vocab
    • Let users contribute while ensuring consistency
    • Making IQ computable in practice
  • Quality indicators
    • Hit ratio, mass coverage, ELDP
    • Need to experimentally establish correlation between indicators and probability of mismatch
    • => HitList {proteinID, HitRatio , Coverage, …}
  • QA: Quality Assertions
    • Formally capture clues as funcitons of indicators
    • Acceptability criteria are conditions on QAs
  • See myGrid
  • Let users add to ontology
    • Use reasoning to check consistency
    • See paper re PI-acceptability
  • Computing quality in practice
    • Need to add:
      • Annotation model * Rep of indicator values as semantic annotations
      • Binding model * Data ontology classes -> data resources * Functions ontology classes -> service resources
    • Can then build architecture to compute quality
  • But how is HitRatio calculated?
    • Programmatically defined currently in web services
    • Future: use rules?
  • Have an arch which allows users to compute quality criteria

A Lexicon Model for Multilingual/Multimedia Ontologies

  • Motivation
    • Information extraction
    • Providing lexicon for ontology-bnased info extraction
  • General:
    • Semiotic triangle
      • De Saussure
      • Adopted in KR (Sowa, 1984)
  • Features: Interacting layers
    • Images, text etc
    • Content : Features : Feature associations : Ontology
  • Feature associations (to ontology)
  • LingInfo is an RDFS ontology
    • (not possible in OWL-DL)
  • Comparisons
  • Applications
    • LingInfo developed in SmartWeb project
      • Upper model DOLCE
      • Domain indep model SUMO
      • Other domain ontologies * Sports events * Navigation * Discourse * Multimedia
      • German & English
  • Ontology-based Info Extraction
    • TDL: type description lang: representation lang used by SProUT
    • SProUT extraction patterns can be triggered by lexical types
  • KB generation
    • Duplicate detection & redundancy removal
  • Apps: image2text
    • Extract features
    • Look up ontology class using dictionary associations
    • Extract features from surrounding text
    • Link text features to class
  • Apps: text2text
    • English -> German
    • -> german classifiers
  • Other apps
    • Dialog processing
    • Ontology learning
  • WiP
    • Lexical acquisition
    • Predicate-argument structure

Semantic Web Mining and Personalisation

Semantic Network Analysis (SNA) of Ontologies

  • Centrality measures & Eigensystem analysis
  • Semantic Network Analysis (SemNA )
  • Test cases
    • SWRC: sem web for research communitites
    • SUMO: considered in paper
  • Preprocessing
    • Each concept and property is node in graph
    • Directed edges
      • Concept and property hierarchy
      • Domain & range of properties
  • Centrality measures
    • Degree centrality
      • Counts in/out connections per node
    • Betweenness centrality
      • Normalized number of shortest paths between any two nodes that pass through the given node * Is on many communication paths
    • Eigenvector centrality
      • Related to other relevant nodes
      • E.g. page rank
  • Eigensystem analysis
    • Math tool: structural analysis of graphs
    • Allows for directional information and for 'zooming' into substructures
    • Complex Hermitian Matrix
      • Well behaved system
    • Subspaces used to describe patterns
      • Sum of all patternsis original eigenvalue matrix
  • Eigenspectrum of SWRC
    • Point symmetric -> star structure
    • => concept hierarchy is predominant structure in SWRC
    • 14 eigenvectors -> 70% relevance
  • Representation
    • Colour : relevance
    • Saturation
  • Identify two patterns (brightest red)
    • Publication
    • Organization
  • Analysis of SWRC
    • Relevance: academic staff > person > employee (mostly irrelevant) : see slides
  • Projectors
    • BibTeX part most prominent but non-hierarchical
      • Non-star structure
    • Structure: five stars centred around:
      • Organization
      • Acadademic staff
      • Project
      • Event
      • Person
  • Conclusion
    • Comparison of centrality measures
    • Eigensystem analysis shows same as degree centrality & betweenness centrality but much more
      • Shows that certain concepts can be removed (e.g. employee)
  • Open issues
    • Compare with OntoClean
    • Needs tuning for search, navigation, browse ontologies

Content Aggregation on Knowledge Bases Using Graph Clustering

  • Summarization of KBs
  • Semantic P2P overlays for KM
  • Metrics on ontologies
    • Length of path
    • Perceived distance reduces with depth
  • k-Modes clustering
  • Mode := element with largest closeness centrality
  • Evaluation
    • Choose peers with self-description close to query
    • Papers from DBLP & ACM DL
      • KBs > 10 topics
    • Evaluate against nr authors
    • 40k papers, 317 authors
    • Query for each of 1474 acm topics
  • Fuser concept (Hovy/Lin 1999)
    • Good summary iff subtopics have similar weights
  • To get 70% recall only need to query 10% peers

Dynamic Assembly of Personalized Learning Content on the Semantic Web

  • http://goodoldai.org.yu
  • Ontology-based approach
  • Learning paths ontology
    • Optimal learning strategy
  • User model ontology
  • TANGRAM
  • Architecture (see slide)
    • Content mgt
    • UM mgt
    • Dynamic assembly
    • Coordinator
    • UI module
  • Ontologies:
    • ALOCoM -based ontologies
      • Split into * Content structure ontology * Content type ontology
    • IIS domain ontology
    • Learning Paths ontology
      • Extension of SKOS core
    • User Model ontology
      • User modelling stds * IEEE: PAPI & PAPI Learner * IMS LIP
      • & other researchers
  • See slide of resulting ontology
  • Personalized learning
    • Functionality
      • Provision of learning content
      • Access to content
  • Future
    • More precise formal desc of IIS domain
    • Improve TANGRAM subsystem
    • Repurposing content
  • http://ariadne.fon.bg.ac.yu/TANGRAM/app ??

Interactive Ontology-Based User Knowledge Acquisition

  • SW and personalisation
    • Two-fold relationship
      • Personalisation techniques to enhance usability of SW apps
      • SW technologies to enhance user-adaptive apps
  • Focus on second point
    • SW techs to solve modelling problem
    • e-Learning domain
  • e-Learning
    • Traditional personalisation domain
  • User knowledge acquisition
    • Difficult to keep current state of user
  • Scenario
  • Problem
    • Can user's conceptual model be used to enable personalisation and adaptation of learning envs on SW
    • Approach: via dialogue
  • OntoAIMS architecture (see slides)
  • Interactive user modelling
    • Dialog agent
    • => long-term conceptual state (user model)
  • Graphical dialogue screen: OWL-OLM
  • Task recommendation / resource browsing

Ontology Alignment #1

Matching Hierarchical Classifications with Attributes

  • Using ontologies to match schemas
  • CtxMatch 1.0
    • Matching hier classifications: taxonomies
  • CtxMatch 2.0
    • Deals with richer schemas
      • Include explicit attributes & implicit roles
  • Methodology
    • Elicited schemas
    • Matching is then trivial
      • RACER reasoner
  • Classifications with attributes
    • Images … Italy … Beaches
    • But Italy is not subclass of Images!
    • Role is implicit
      • So images 'about' Italy is subclass of Images
      • And Beaches are locatedIn Italy
  • Implicit roles are often hidden in the lexical meaning of the node
  • CtxMatch 2.0
    • Construct meaning skeletons
    • Construct local meaning of nodes
    • Filter out incompatible skeletons
  • Meaning skeletons
    • WDL: lexicalized representation language
  • Local meanings
    • WordNet used
    • But any other dictionary would do
  • Filtering local meanings
    • Discard senses not found in the relations of ontology
    • May end up with several alternatives
  • Relations between local meanings
  • Matching using standard reasoning techniques
  • Compute mapping
    • using formulae for node pairs
    • Lexical + Domain knowledge => inference of equivalence
  • Peer-to-peer schema matching
    • Agents with diff schema
    • Will need either using same dictionary or mapping between the two
  • Applications: see slide

Community-Driven Ontology Matching

  • CDOM
    • Involve end users
    • Output = annotated mappings
  • Architecture
  • Overview
    • ~50 existing matching systems/approaches
    • Hard to reuse

...so this talk was describing a system for making some subset of these matching tools generally usable. If you go to http://align.deri.org you can put in URLs for a pair of ontologies, and it'll use a few different tools/strategies to match them. It produces suggestions which you can edit, and results in a list of equivalentClass and presumably subClass assertions. [NG]

Empiric Merging of Ontologies - A Proposal of Universal Uncertainty Representation Framework

  • Background
    • Ontology Learning (OLE project)
      • Uncertain acquisition of knowledge
  • Crisp ontology acquisition
    • Preprocessing NL texts
      • Std techniques
    • Taxonomy extraction methods
      • Pattern-based
      • Clustering-based
  • Motivations
    • Precision-recall trade-off
    • Noise introduction
    • Knowledge inconsistencies from diff domains
      • Introduced integration & refinement of inconsistent kn * Use empirical consistency measure
    • Reflect human mental models
      • Not crisp structures * Vague, overlapping referential associations
  • Framework
    • Format called ANUIC
      • Adaptive Net of Universally Interrelated Concepts
    • Conviction function
  • Utilisation
    • 3000 texts from CS
      • 20M words
    • Used pattern-based for ontology
    • Merged ontologies
      • Into one ANUIC structure * 5k classes, 9k indivs
    • Very rough taxonomy
      • But proof of concept
    • See slide for sample
    • Improvement by ANUIC 130-200%

An iterative algorithm for ontology matching - Andreas Heß

[NG] There are a variety of underlying use-cases here: perhaps two folk annotate pictures using different ontologies, or web services are described using different ontologies. The underlying algorithm involves representing the match between two ontologies as a weighted graph.

Given two ontologies which are mapped, you can improve the mapping of a third by mapping it to both and then combining the similarities.

Evaluation

  • in some cases, lexical matching does more of the work than structural mapping does
  • minimum-similarity thresholds have an effect
  • there appears to be no overall best mapping strategy

Heterogeneous ontologies - Chiara Ghidini

Very interesting. Concerned with matching/mapping different ontologies describing the same thing, by hand, in complicated cases. They've developed a Distributed Description Logic (DDL) (this is just the stuff I'm interested in!).

Questions:

  • Are mappings between concepts the only form of mappings?
  • do people always represent the same knowledge using the same ontological concepts? (`marriage could be a concept or a relation')
  • No to both questions!

So mappings are more complex than concept-concept mappings. LatLong can be modelled as a pair of concepts, or a class with two (real) properties. A marriage can be modelled as (Man, marriedTo, Woman) or as the pair of assertions (WeddingCertificate , husband, Man) and (WeddingCertificate , wife, Woman). Their syntax distinguishes homogeneous (concept-concept) and heterogeneous (concept-role) bridge rules.

Distributed Reasoning Architecture for a Galaxy of Ontologies: <http://drago.itc.it/>

Questions:

  • Effectively use concepts in ontology B using concepts in A? Yes, precisely that.
  • Given the set of bridge rules, can you deduce one ontology from the other? Yes, in one direction, but you can't necessarily go the other way without specifying a new set of rules.
  • Is that symmetric? So, no.
  • How well do chains of bridges work in practice? They don't really have enough experience to tell

Ontology Learning

Automatic Extraction of Hierarchical Relations From Text

  • Machine learning for relation extration
    • Many algorithms
    • SVM using rich features got good results
      • Wide data sets
      • Good at finding relevant features
  • Paper
    • Application of SVM
    • Investigate variety of NLP features
    • Evaluate kernels of SVM
  • Used ACE04 Corpus
    • Topology of entities and relations
      • 7 relation types, 23 subtypes
    • Relation hierarchy (see slide)
  • Using SVM
    • Binary classifier
    • One-against-one method
  • NLP features
    • E.g. Pair of entities in sentence
      • & neighbouring words
    • Used GATE & plugins
    • => 94 features for each possible pair
  • Simple features
    • Words
    • POS tags (part of speech)
    • Features in ACE04 corpus
    • Overlap features
      • Relative position of two mentions
  • Syntactic features
    • Chunk features
    • Dependency feature from MiniPar
    • Parse tree feature from BuChart
  • Semantic features
  • Discussion
    • Every feature has some contribution
    • Most features improve the recall
    • Complex features do not contribute as much as hoped
      • But pay off deeper in the hierarchy
  • Diff kernel types
    • Linear kernel best performance
  • http://gate.ac.uk
  • http://nlp.sheff.ac.uk

An Infrastructure for Acquiring High Quality Semantic Metadata

  • http://kmi.open.ac.uk/people/index.cfm?id=60
  • Quality
    • Accurately capture meaning of data object
      • Each entity maps to one and only one data object
    • Semantic metadata should be correctly populated
      • Correct URI
  • Current support
    • On-To-Knowledge
    • SCORE
    • CS AKtive Portal
    • Flink
    • Relatively weak support for quality control
      • Mainly manual
      • Some co-relation and disambiguation
  • Case study: KMi web portal
  • Requirements
    • Automated & adaptive extraction
    • Address heterogeneity
    • Minimize extraction errors
    • Proper population
    • Update from new sources
  • See ADSI framework slide
  • Layers
    • Source
    • Extraction
    • Verification
    • Application
  • Extraction
  • Verification
    • Instance classification tool: PANKOW + WordNet
    • Data querying engine
    • Verification engine

Extracting Instances of Relations From Web Documents Using Redundancy

  • Relation instantiation
    • Relations defined at the instance level
    • But which one is right for the instances of the classes
  • Existing methods on QA/I.E.
    • Sophisticated emthods
  • Use redundancy to make up for loss of performance
  • Assumptions
    • Not 1-1 relation
    • Instantiated concepts
    • Must be on web
    • Seed set
  • Outline
    • Retrieve/select corpus
    • Identify instances
    • Rank candidates
  • MultimediaN E-culture project
    • Art & Architecture thes (AAT)
    • Unified List of Artist Names (ULAN)
  • Triple20 ontology browser
    • SWI-Prolog based
  • Redundancy method: CHD
    • Google
    • ULAN
    • Rank candidate artists
  • Gold standard manually created using authoritative pages:
    • 30 expressionists & 17 impressionists
  • Discussion
    • F1 promising: 0.68-0.83
      • Stable wrt seeds
    • Iterative method
      • Comparable F1: 0.71
      • Eventually recall is higher
  • Future
    • Use other domains
    • Investigate threshold for when to stop iterations
    • Add e.g. time constraints for art styles
  • Related
    • KnowItAll : Etzioni et al
    • Armadillo: Ciravegna et al
      • Also uses redundancy
    • Normalized Google Distance: Cilibrasi et al
      • Semantic distance between terms

Closing plenary

Usability and the Semantic Web

Anthony Jameson (DFKI and International University in Germany)

  • Challenges
    • Searching/querying
      • Minimize complexity for end user
      • Ensure minimally necessary understanding
    • Adding information to ontologies
      • Induce users to do work
      • Involve users in design
  • Focus on users
    • seMouse
    • Halo2: knowledge querying
      • The 'Digital Aristotle' vision
    • DarkMatter
  • Minimizing complexity and cognitive effort
    • Strategies
      • Recognition rather than recall
      • Domain-specific interfaces
      • System mapping from input to formal rep
      • Don’t require adherence to ontology
      • Support trial and error
    • OntoIR
      • Concept pick out too complex
      • Label parts and present to user
  • Expected benefits
    • Often little result from semantically based system
      • Strategy: * allow easy refinement * Piggyback on methods that yield some benefit
    • http://tap.stanford.edu
  • SmartWeb : Stadium
    • How to convey understanding to user
      • Mental model of system
      • When needed * When something goes wrong * Understand unexpected behaviour
    • Distinguish design model vs user model
  • Ways of conveying mental models
    • Suggest what user can do
      • Appearance of input elements
      • Examples of possible inputs
    • Suggest what system has done
      • Indocations of information used to derive responses
  • SemIPort document manager
  • Intermediate motivation
    • Mangrove: Annotation tool
      • Feedback from services
  • Long-term
    • Community Navigator (Hideaka Takeda)
  • Existing research
    • Social psychology
      • E.g. * Collective effort theory * Goal setting theory
      • Utility * Yield unobvious predictions * Often not confirmed
    • Groupware, online community
      • Tested in practical settings
      • Diff from SW apps
  • How to motivate users to contribute
    • 'Only you can do it'
    • Remind of benefits
      • May backfire
    • Publicize contributions
    • Offer money, iPods, chocolate
    • Caveat
      • Try it out in your setting first
  • Conduct users studies throughout design and development
  • How to exploit knowledge about users
    • Analysis of reqmts
      • Look at what they do now and understand it
    • i/f design
      • Design principles & guidelines, psychological knowledge
    • Iterative testing & prototypes
      • Start with mock-ups
    • Summative evaluation of final version

-- TonyLinde - 18 Jun 2006 -- NormanGray - 29 Jun 2006

Topic revision: r3 - 2006-06-29 - 11:13:29 - NormanGray
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback