Notes from 'How Can Culture Really Connect? Semantic Front Line Report' at MW2008

These are my notes from the workshop on "'How Can Culture Really Connect? Semantic Front Line Report" at Museums and the Web 2008. This session was expertly led by Ross Parry.

The paper, "Semantic Dissonance: Do We Need (And Do We Understand) The Semantic Web?" (written by Ross Parry, Jon Pratty and Nick Poole) and the slides are online. The blog from the original Semantic Web Think Tank (SWTT) sessions is also public.

These notes are pretty rough so apologies for any mistakes; I hope they're a bit useful to people, even though it's so late after the event. I've tried to include most of what was discussed but it's taken me a while to catch up.

There's so much to see at MW I missed the start of this session; when we arrived Ross had the participants debating the meaning of terms like 'Web 2.0', 'Web 3.0', 'semantic web, 'Semantic Web'.

So what is the semantic web (sw) about? It's about intelligent and efficient searching; discovering resources (e.g. URIs of picture, news story, video, biographical detail, museum object) rather than pages; machine-to-machine linking and processing of data.

Discussion: how much/what level of discourse do we need to take to curators and other staff in museums?
me: we need to show people what it can do, not bother them with acronyms.
Libby Neville: believes in involving content/museum people, not sure viewing through the prism of technology.
[?]: decisions about where data lives have an effect.

Slide 39 shows various axes against which the Semantic Web (as formally defined) and the semantic web (the SW 'lite'?) can be assessed.
Discussion: Aaron: it's context-dependent.

'expectations increase in proportion to the work that can be done' so the work never decreases.

sw as 'webby way to link data'; 'machine processable web' saves getting hung up on semantics [slide 40 quoting Emma Tonkin in BECTA research report, ‘If it quacks like a duck…’ Developments in search technologies].

What should/must/could we (however defined) do/agree/build/try next (when)?

Discussion: Aaron: tagging, clusters. Machine tags (namespace: predicate: value).
me: let's build semantic webby things into what we're doing now to help facilitate the conversations and agreements, provide real world examples – attack the problem from the bottom up and the top down.

Slide 49 shows three possible modes: make collections machine-processable via the web; build ontologies and frameworks around added tags; develop more layered and localised meaning. [The data (the data around the data) gets smarter and richer as you move through those modes.]

I was reminded of this 'mash it' video during this session, because it does a good jargon-free job of explaining the benefits of semantic webby stuff. I also rather cynically tweeted that the semantic web will "probably happen out there while we talk about it".

Are shared data standards and shared repositories the future?

I keep having or hearing similar conversations about shared repositories and shared data standards in places like the SWTT, Antiquist, the Museums Computers Group, the mashed museum group and the HEIRNET Data Sans Frontières. The mashed museum hack day also got me excited about the infinite possibilities for mashups and new content creation that accessible and reliable feeds, web services or APIs into cultural heritage content would enable.

So this post is me thinking aloud about the possible next steps – what might be required; what might be possible; and what might be desired but would be beyond the scope of any of those groups to resolve so must be worked around. I'll probably say something stupid but I'll be interested to see where these conversations go.

I might be missing out lots of the subtleties but seems to me that there are a few basic things we need: shared technical and semantic data standards or the ability to map between institutional standards consistently and reliably; shared data, whether in a central repository or a service/services like federated searches capable of bringing together individual repositories into a virtual shared repository. The implementation details should be hidden from the end user either way – it should Just Work.

My preference is for shared repositories (virtual or real) because the larger the group, the better the chance that it will be able to provide truly permanent and stable URIs; and because we'd gain efficiencies when introducing new partners, as well as enabling smaller museums or archaeological units who don't have the technical skills or resources to participate. One reason I think stable and permanent URIs are so important is that they're a requirement for the semantic web. They also mean that people re-using our data, whether in their bookmarks, in mashup applications built on top of our data or on a Flickr page, have a reliable link back to our content in the institutional context.

As new partners join, existing tools could often be re-used if they have a collections management system or database used by a current partner. Tools like those created for project partners to upload records to the PNDS (People's Network Discovery Service, read more at A Standards Framework For Digital Library Programmes) for Exploring 20th Century London could be adapted so that organisations could upload data extracted from their collections management, digital asset or excavation databases to a central source.

But I also think that each (digital or digitised) object should have a unique 'home' URI. This is partly because I worry about replication issues with multiple copies of the same object used in various places and projects across the internet. We've re-used the same objects in several Museum of London projects and partnerships, but the record for that object might not be updated if the original record is changed (for example, if a date was refined or location changed). Generally this only applies to older projects, but it's still an issue across the sector.

Probably more importantly for the cultural heritage sector as a whole, a central, authoritative repository or shared URL means we can publish records that should come with a certain level of trust and authority by virtue of their inclusion in the repository. It does require playing a 'gate keeper' role but there are already mechanisms for determining what counts as a museum, and there might also be something for archaeological units and other cultural heritage bodies. Unfortunately this would mean that the Framley Museum wouldn't be able to contribute records – maybe we should call the whole thing off.

If a base record is stored in a central repository, it should be easy to link every instance of its use back to the 'home' URI, or to track discoverable instances and link to them from the home URI. If each digital or digitised object has a home URI, any related content (information records, tags, images, multimedia, narrative records, blog posts, comments, microformats, etc) created inside or outside the institution or sector could link back to the home URI, which would mean the latest information and resources about an object are always available, as well as any corrections or updates which weren't replicated across every instance of the object.

Obviously the responses to Michelangelo's David are going to differ from those to a clay pipe, but I think it'd be really interesting to be able to find out how an object was described in different contexts, how it inspired user-generated content or how it was categorised in different environments.

I wonder if you could include the object URL in machine tags on sites like Flickr? [Yes, you could. Or in the description field]

There are obviously lots of questions about how standards would be agreed, where repositories would be hosted, how the scope of each are decided, blah blah blah, and I'm sure all these conversations have happened before, but maybe it's finally time for something to happen.

[Update – Leif has two posts on a very similar topic at HEIR tonic and News from the Ouse.

Also I found this wiki on the business case for web standards – what a great idea!]

[Update – this was written in June 2007, but recent movements for Linked Open Data outside the sector mean it's becoming more technically feasible. Institutionally, on the other hand, nothing seems to have changed in the last year.]

Semantic Web Think Tank, Cambridge, March 2007

Here's a write-up of the discussions of the Semantic Web Think Tank meeting held at the Museum of Anthropology and Archaeology, Cambridge, on March 19, 2007.

I can't promise my notes are complete, or that I've correctly attributed everyone's comments, but in the interests of having something for people to react to before the transcripts can be done, here it is. [??] means I didn't note down in time who said it, and other bits in square brackets are where I can't remember if I said it or just noted it. I've posted it as is but if you were there, let me know of any corrections, additions, etc.

Key:
Jeremy Ottevanger (JO), Frances Lloyd Baynes (FLB), Paul Shabajee (PS), Suzanne Keene (SK), Robin Boast (RB), Richard Light (RL), Mike Lowndes (ML), Dylan Edgar (DE), Alex Whitfield (AW), Brian Kelly (BK), Ross Parry (RP), Mia Ridge (MR), Nick Poole (NP), Jon Pratty (JP).

Ross Parry: Introduction and welcome.

What should the final output of the workshops be? We will need to present our findings at the June UK Museums On The Web in Leicester.

Possibly "A Netful of Jewels 2"? It could reflect on opportunities of last ten years and look at current state.

We've spent a lot of time discussing the technical problems but what about the conceptual problems?

NP would want the afternoon conversation to be embedded in real world practice, funding and technology.

RP asked everyone around the table to introduce themselves and their goals for the workshops.

MR: Achievable goals, practical recommendations for museums. Infrastructure and standards should be interoperable and reusable.
JO: high aspirations, low barriers to entry.
FLB: embedded in real world, achievable.
SK: unified view, future proof.

[??] Research labs vs real world: what's happening where?

Robin Boast

RB: interests are history and philosophy of science and knowledge. SW: problems with knowledge.

They have put 200,000 records on online catalogue since 1998, using (or in sympathy with) SPECTRUM, but an extended version. It took 28 years to enter the data.

But: no-one cares. People are not interested in the data. Why?

RB: the SW is in opposition to Web 2.0. Agents in SW…

RB: Tim Berners-Lee (TBL) says SW: Language expresses data and rules for reasoning about the data.
RB: what if our understanding of how knowledge systems operate is contentious?

So why aren't people interested? Catalogues are useful for finding and managing objects, but they're not useful for people wanting to know about objects. Users always have their own local systems for representing the objects. These systems are diverse in content, structure, idiom and purpose. All systems are meaningful but only locally.

MR: what research have they done to support these conclusions? RB: it's based on work with communities using the data.

TBL-style rejoinder (might be): But all joined through object… RB: but… each diverse local system has a different definition of what the object is in scope, boundaries, associations and definition. Interest might be in family of objects or history of object rather than object itself.

RB: the conceptual psychology underling TBL vision is contentious and is currently under heavy fire.

The other vision: knowledge is not a classification of the world but an ongoing negotiation within knowledge communities. Knowledge is dynamic and diverse; it's hard, takes work, commitment and engagement. Most knowledge systems cannot be directly translated to another e.g. translating between languages. The knowledge object itself is dynamic and diverse. Sharing knowledge depends, necessarily, on an ongoing conversation.

Is the SW bad? No. But it depends on where, how, for who and which part is used. It's not a universal system of knowledge but a local, situated tool.

RP: tools to improve discoverability.
RB: SW is about universalisation but that doesn't mean we can't take from it.
RL: the CRM is 'coy' at object level – it doesn't care at that level, it pays more attention in different parts.

[The conversation that followed basically edged around issue of whether we're talking SW or sw, lower-case semantic web being an informal version. The definition was discussed at the first Leicester meeting. I wonder how and to what extent the particular audience and context interpretative content was written for will affect how the interpretative content is perceived outside of that context. How much does the POV of interpretation matter when content is published more widely?]

PS: guidelines… are useful in this context.
DE: museums start from position of wanting to put catalogues online. They should start from position that they want people to find out about objects in collection then look at the best way to do it.
SK: online catalogues can be finding aids, they don't need to be more complicated. Not about mediated knowledge from authoritative POV that encapsulates everything they know about the object. For example, CHIN in Canada.

FLB: museums can't anticipate how people will make meaning with objects; museums should facilitate access.

ML: will still need to link that information with the objects. SW allows us to decouple objects from systems they came from. Once decoupled then need to find way of inferencing things.

RP: the discussion really has two parts: collections management and related requirements vs access (a la Areti), Internal / external access and requirements?

[??] Aiding discoverability?

MR: What about museums as the central 'home' for objects (URIs)? Everything can point back to it, even if it doesn't point back out with explicitly created links but uses implicit [discoverable] links (trackbacks and pings for museum objects?) How does that affect museum authority, curatorship?

RB: they are giving logins to their new CMS to people around the world so they can put their layers of meaning, multiple names, classifications. Their data is stored within their own space. [MR: how does that work with statement about different views of 'object'.] RB: internal management system.

MR: we need to resolve issue of where we stand on ontologies as well as sw/SW.

RB: ontologies are dynamic, emergent.

FLB: museums have to provide a base-level ontology for their objects, but also allow/understand that other ontologies will apply. The user will have the context of museum voice to understand ontology.

RP: difference between promoting sw standards for community of practice vs promoting sw tools for community of interest. Both sw and SW?

[break]

Richard Light
RL demonstrated Topic Maps. He used found content – digitised letters from the Wordsworth Trust. They were transcribed into TEI and summarised into ModesXML object summaries. Generated XML Topic Map XTM. Web app queries ModesXML Topic Map. Then can link back to summary data then back to source material.

Word docs saved by Open Office (Sebastian Rahtz has a plug-in) as TEI.

Can have multiple identifiers, topic maps have merging rules. Associations, assertions can be included.

Topic maps: mirror structure of data. Need some kind of ontology for data so it can go into a topic map.

[Some discussion around] Topic maps vs relational data. Topic maps and relation to already highly structured Collection Management System data… [Paul: had whole bit but I missed it cos I was reviewing document, duffer.] Can combine multiple ontologies. Community of interest can create own indexing, combine with authoritative metadata to generate own. "Ontology of hatness."

RL: can inherit hierarchy so can query based on that even though not all of the hierarchy is visible.

RL: structure in source becomes ontology in web environment.
The CIDOC website uses a CRM Topic Map to produce a self-organising navigation panel and the visitor can filter pages e.g. related to activity.

Museums and SW: machine-processable information, usually XML. [Paul – N3 – alternative for XML to RDF?] Information re-worked in some way, value added to it. The (end) user of the information isn’t the producer of the information [back to internal/external uses, management of collections vs access]. The producer of the information has no control over or knowledge of the users to which "their" information is put.

RL on being a web 2.0 information publisher:
Define and publish an XML application or use an existing XML format [BK: HTML is fighting back]
Provide a reliable information-delivering service as a stable URL
Accept you have no control over what uses are made of your information

[??] What about museums as consumers/users of web 2.0 information as well as producers? Collaborative future, where museums are contributors to bigger picture? Looking beyond sector, but e.g. relate to other historical content

ITIS? Taxnomic info for species. AAT (art and architecture thesaurus), DNB dictionary national bibliography, Grove Art.

Two XML interchange formats? One for museum-to-museum, one for public information exchange. Mismatch between what public might be interested in (people, places) and what museums have to say, this object exists.

Stuff in Collections Management Systems that isn't about management of objects: events, people, places, stories.

Need standards for interchange of collection level descriptions?

Bamber Gascoigne's Timesearch website?

RL: what is our scope? Machine processable feeds. Generic web 2.0 (e.g. learning objects, mash-ups). Semantic web proper: OWL, Topic maps.

RL would exclude user-generated content (calls it web 1.0) [and concentrate on the] conversion of additional musem resources to web format (e.g. converting publications to TEI).

What's around?
XML based projects, BRICKS ('Building Resources for Integrated Cultural Knowledge Services', CRM OWL-based), English Heritage's MIDAS.

We need low level standards: when was it, where was it? GML for places, ISO time standard.

RL: need to develop one museum delivery format, in XML. Look at naming, look for as much coherence as possible with least amount of effort.

[MR: create something that sits above Collections Management Systems level, provides level of genericisation (generalisation?) i.e. made somewhat generic/interoperable? Then apply transformations to suit different external or internal requirements. What data standard(s) to use?]

[post lunch]

Dylan Edgar
RP: internal SW/external sw. Is the idea getting towards something that could serve as a seed for both? Start with location of museum.

BK: radius space 5 [??] Nottingham Uni mashup of cultural resources. Northumbria?

FLB: loans could be an institution- and sector-lead driver for exchanging data.

NP: push for persistent URI for museum/collection/object.

[??] Competing URIs for museums. MLA Institution Server is latest instance.

[MR: institutions devolving power to other orgs – is this another way of trust/radical trust?
Can this group make a decision on where permanent object URIs should live? Is a sector-wide container possible/advisable? Does it matter if one object has more than one persistent URI?]

RL: go international.
PS: museums change.
RB: museums aren't only places that hold collections.
RB: also problem of definition of object. But leaving out on-going discussions, benefits are worth it.

RP: MDA to tweak definition to provide for URIs?

SK: German institute that is an example of it. Deals with change in holding institution [and effect on URIs].

BK: libraries sector is still arguing about unique identifiers. Got to be HTTP URIs not some new schema.

FLB: URIs should be something that can be implemented in an [iterative] approach, not require that all development stops.

RL: should the URI be required to point to anything? [MR: like a pot of gold at the end of the rainbow?]

RP: this afternoon about getting realistic, what's possible in the UK museum sector today.

DE: benefits of sw: creating better meaning for people. [And how to sell it to policy makers]. How do we get funders excited about it? Similar to documentation debate. Focus on outcomes and benefits rather than 'let's implement sw'. Tie it into wider agendas. Scale and diversity of sector is a challenge. Especially for smaller museums with less or no IT resources.

Two strands: technical and advocacy. Getting practitioner and government on board. Value for money. HLF: interested in impact on people.

RP: are there differences in challenges and opportunities in Scotland?

DE: [his organisation is] smaller, the equivalent of MLA in Scotland but only for museums.

JO: we could be the Model and the Controller, provide the V.

JO/MR: Move from application to being a service. We can still build application on top but others can also build applications too.

Nick Poole
How does it work for practitioners?

[Had diagrams on slides, notes based on those and discussion]

Chain:
Political priorities: social welfare, efficiency.
Departments: DC online [??] a car crash because of fundamental fear of ICT. SW: what you don't know… longer term impact/VFM [value for money?], more inclusive services.
Public sector bodies: MLA, SMC. Longer term strategic change. MLA has never really brought into its own IT strategy for the sector. What SW can do for Knowledge Web, inter-domain data/services. MLA pushing harmonisation across museums, libraries and archive sectors.
Intrastructural organisations: standards, best practice/Standards (non sector): standards, applications. A lot of what we need exists outside our industry. How SW is manifest in; standards, terminological practice.
System providers: client needs, standards environment. SW in the machine. Make it not a threat, and easily understood.
Museum management: Local/national priority, market need; SW: the business case, more for less, the cachet of pathfinding: other nations are doing it better so re-establish national pride. Harder case to argue.
Practitioner: stock control, service delivery; SW: argue that can do the same job, see a bigger return.

Various people including BK: users not in the chain.

Calculus of change. Y Axis: Quick wins, slow burn, tectonics/X Axis: projects, standards, advocacy, background communications, funding/performance.

Projects
Standards
Advocacy
Background communications
Funding/performance
Quick wins Slow-burn Tectonics

[Alex has the diagram of this, I think]

NP: some organisations were so burnt by NOF-digi and Culture Online they're unlikely to digitise anything else any time soon.

What will prevent us getting there? One is the nature and status of this thinktank [?]

Is this just part of what we do or is it new national program?

JO: can we talk to PNDS about re-using data?

BK: users missing from NP's chain. Quality of experience, not quality of [data]?

NP: aggregation?

JO: dream of 'my favourite museum objects'.
[MR: if using something like Exploring 20th Century London data that's published on three sites already (PNDS, Hub and MoL site), where does the 'final' record URI live?]

NP: Standard for resource discovery in museums. Yes.

[MR: How big a piece of work is it? Can it be done iteratively so that small cases can be made to show how it would work, demonstrators, etc? Could it also be a microformat standard that defines a few basic fields and a link to a URI, something that can be implemented quickly a la JO?
Does it tie to learning objects metadata?]

Quick wins and slow burns.

NP: museums still reeling from separation of content and presentation.

[MR: Taking things forward: project to recommend infrastructure and data standards?]

SK: the previous Netful of Jewels report was about what users would be able to do.

BK: SEA: Strategic E-content Alliance (was CIE Common Information Environment).

RP: dissemination to discoverability. Competition to collaboration.

NP: how to get it into museum policy. MR: tie it to funding.

BK: JISC Users and Innovation program?

NP: does industry need core skills in ICT? Or in development of content, mark-up, collections or content management systems?

NP: Open Business [?]

NP: for evidence based policy. RP: research is of interest to him, SK, RB, etc as have students and sources for research funding.

BK: JISC funded open source watch thing [the one with Sebastian Rahtz, presumably]

NP: something to make it tangible and real whether blog or examples. Migration to a new business model. Need some kind of marketing strategy/overarching document that says who we'll talk to, what the strategies are.

ML: SW acts as thing to hang discussions on but afternoon discussions move away from SW. Avoid problem of original Netful that it just seemed like a whinge that could be ignored. Why do joined-up-ness?
Is semantic interoperability a solution? "Semi-mantics"

If talking to David Dawson, how do we point out where the gaps are, that aren't there in EU stuff or Minerva or whatever?

Final summaries
[MR based on something JO said:] apply transformations to the idea for presentation to different audiences just as if it was a schema. E.g. marketing.

NP: call it documentation not SW. What it can do and not what it is in its own right. Show support of sector for project.

[MR: geek stuff should be in the background, stated output should be audience and sector led. Achieve x goals, secret uncool output is in how it's done (sw or SW). Tell story of how it will be used in the end, the implementation of sw as a side effect.]

PS: is anyone a member of W3C? Could be an idea if developing standards on top of standards. What's going on in other sectors?

SK: museums facilitating users, not just providing stuff they think users want.