metrics – Open Objects

Christian Heilmann on Yahoo!'s YQL, open data tables, APIs

My notes from Christian Heilmann's talk on 'Reaching those web folk' with Yahoo!'s new-ish YQL, open data tables and APIs at the National Maritime Museum [his slides]. My notes are a bit random, but might be useful for people, especially the idea of using YQL as an easy way to prototype APIs (or implement APIs without too much work on your part).

For him it's about data on the web, not just technology.

Number of users is a crap metric, [should consider the user experience].

Stats should be what you use to discover areas where are the problems, not to pat yourself on the back.

People with blackberries have no Javascript, no CSS. Don't have front-loading navigation they have to scroll through – cos they won't.

If you think of your site as content, then visitors can become 'broadcasting stations' and relay your message. Information flows between readers and content. They're passing it on through distribution channels you're not even aware of.

Content on the web is validated with links and quotes from other sources e.g. Wikipedia. People mix your information with other sources to prove a point or validate it. eg. photos on maps.

How can you be part of it?
Make it easy to access. Structure your websites in (plain old semantic HTML) a semantic manner. Title is important, etc. Add more semantic richness with RDF and microformats. Provide data feeds or RSS. Consider the Rolls Royce of distribution – an API. Help other machines make sense of your content – search engines will love you too.

Yahoo index via BOSS API – Yahoo do it because they know 'search engines are dying'. Catch-all search engines are stupid. Apples are not the same apples for everyone. Build a cleverer web search.

http://ask-boss.appspot.com/ – nlp analysis of search results. Try 'who is batman in the dark knight' – amazing.

BOSS provides mainstream channel for semantic web and microformats. Microformats are chicken and egg problem. Using searchmonkey technology, BOSS lists this information in the results. BOSS can return all known information about a page, structured.

Key terms parameter in BOSS – what did people enter to find a site/page? http://keywordfinder.org/ – what successful websites have for a given keyword.

Clean HTML is the most important thing, semantic and microformats are good.

If your data is interesting enough, people will try to get to it and remix it.

[Curl has grown up since I last used it! Can be any browser, do cookies, etc.]

Now the web looks like an RSS reader.

Include RSS in your stats.

Guardian – any of their content websites put out RSS through CMS. They then provided an API so end users can filter down to the data they need.

Programmable Web – excellent resource but can be overwhelming.

The more data sources you use, the more time you spend reading API documentation, sos every API is different. Terms, formats, etc. The more sources you connect to, the more chances of error. The more stuff you pull in, the slower the performance of your website.

So you need systems to aggregate sources painlessly. Yahoo Pipes. A visual interface, changes have to be made by hand.

You can't quickly use a pipe in your code and change it on the fly. e.g. change a parameter for one implementation. No version control.

So that's one of the reasons for YQL: Yahoo Query Language. SQL style interface to all yahoo data (all Yahoo APIs) and the web. Yahoo build things with APIs cos it's the only way to scale. Book: 'scalable websites', all about APIs.

Build queries to Yahoo APIs, try them out in YQL console. Provides diagnostics – which URLs, how long it took, any problems encountered. Allows nesting of API calls.

Outputs XML or JSON, consistent format so you know how to use that information.

YQL also helped internally because of varying APIs between departments.

Gives access to all Yahoo services, any data sources on the web, including html and microformats, and can scrape any website.

Open tables
Easy way to add own information to YQL. Tell Yahoo end point where can get the info.

Jim wanted to allow people to access data without building an API. All it needed was a simple XML file.

[Though you do need RSS results from a search engine to point to – I'm going to see what we can output from our Google Mini and will share any code – or would appreciate some time-saving pointers if anyone has any. Yes, hello, lazyweb, that's my coat, thanks.]

Basically it's a way of providing an API without having to develop one.

Concluding: you can piggyback on people's social connections with other people by making data shareable. [Then your data is shared, yay. Assuming your institution is down with that, and no copyrights or puppies were hurt in the process.]

APIs are a commitment – have to be available all the time, lot of traffic, but hard to measure traffic and benefits. Making APIs scale is a pain and have to be clever to do it. Pointing YQL open data table pointing to search engine on your site also works.

Saves documenting API? [??]

YQL handles the interface, caching and data conversion for you. Also limits the access to sensible levels – 10,000 hits/hour.

Jim – 'images from collection' displayed on page as badge thing with YQL as RSS browser. Can just create RSS feed for exhibition than can new badge for new exhibition.

Using YQL protects against injection attacks.

Comment from audience – YQL as meta-API.

Registering is basically making the XML file. You need a Yahoo ID to use the console. [The console is cool, basically like a SQL 'enterprise' system console, with errors and transaction processing costs.]

We had questions about adding in metrics, stats, to use both for reporting and keeping funders/bosses happy and for diagnostics – to e.g. find out which areas of the collection are being queried, what people are finding interesting.

github repository as place to register open tables to make them discoverable.

There's a YQL blog.

[So, that's it – it's probably worth a play, and while your organisation might not want to use it in production without checking out how long the service is likely to be around, etc, it seems like an easy way of playing with API-able data. It'd be really interesting to see what happened if a few museums with some overlap in their collections coverage all made their data available as an open table.]

Happy developers + happy museums = happy punters (my JISC dev8D talk)

This is a rough transcript of my lightning talk 'Happy developers, happy museums' at JISC's dev8D 'developer happiness' days last week. The slides are downloadable or embedded below. The reason I'm posting this is because I'd still love to hear comments, ideas, suggestions, particularly from developers outside the museum sector – there's a contact form on my website, or leave a comment here.

"In this talk I want to show you where museums are in terms of data and hear from you on how we can be more useful.

If you're interested in updates I use my blog to [crap on a bit, ahem] talk about development at work, and also to call for comment on various ideas and prototypes. I'm interested in making the architecture and development process transparent, in being responsive to not only traditional museum visitors as end users, but also to developers. If you think of APIs as a UI for developers, we want ours to be both usable and useful.

I really like museums, I've worked in three museums (or families of museums) now over ten years. I think they can do really good things. Museums should be about delight, serendipity and answers that provoke more questions.

A recent book, 'How does one become a scientist? : survey on the birth of a Vocation' states that '60% of scientists over 30 and 40% of scientists under 30 note claim, without prompting, that the Palais de la Découverte [a science museum in Paris] triggered their vocation'.

Museums can really have an impact on how people think about the world, how they think about the possibilities of their lives. I think museums also have a big responsibility – we should be curating collections for current and future audiences, but also trying to provide access to the collections that aren't on display. We should be committed to accessibility, transparency, curation, respecting and enabling expertise.

So today I'm here because we want to share our stuff – we are already – but we want to share better.

We do a lot of audience research and know a lot about some of our users, including our specialist users, but we don't know so much about how people might use our data, it's a relatively new thing for us. We're used to saying 'here are objects in a case, interpretation in label', we're not used to saying 'here's unmediated access, access through the back door'.

Some of the challenges for museums: technology isn't that much of a challenge for us on the whole, except that there are pockets of excellence, people doing amazing things on small budgets with limited resources, but there are also a lot of old-fashioned monolithic project designs with big overheads that take a long time to deliver. Lots of people mean well but don't know what's possible – I want to spread the news about lightweight, more manageable and responsive ways of developing things that make sense and deliver results.

We have a lot of data, but a lot of it's crap. Some of what we have is wrong. Some of it was written 100 years ago, so it doesn't match how we'd describe things now.

We face big institutional challenges. Some curators – (though it does depend on the museum) – fear loss of control, fear intellectual vandalism, that mistakes in user-generated content published on museum sites will cause people to lose trust in museums. We have fears of getting the IT wrong (because for a while we did). Funding and metrics are a big issue – we are paid by how many people come through our door or come to our websites. If we're doing a mashup, how do we measure the usage of that? Are we going to cost our organisations money if we can't measure visits and charge back to the government? [This is particularly an issue for free museums in the UK, an interesting by-product of funding structures.]

Copyright is a huge issue. We might not even own an object that appears in our collections, we might not own the rights to the image of our object, or to the reproductions of an image. We might not have asked for copyright clearance at the time when an object was donated, and the cost of tracing it might be too high, so we can't use that object online. Until we come up with a reliable model that reduces the risk to an institution of saying 'copyright unknown', we're stuck.

The following are some ways I can think of for dealing with these challenges…
Limited resources – we can't build an interface to meet every need for every user, but we can provide the content that they'd use. Some of the semantic web talks here have discussed a 'thin layer' of application over data, and that's kind of where we want to go as well.

Real examples to reduce institutional fear and to provide real examples of working agile projects. [I didn't mean strictly 'agile' methodology but generally projects that deliver early and often and can respond to the changing technical and social environment]

Finding ways for the sector to reward intelligent failure. Some museums will never ever admit to making a mistake. I've heard over the past few days that universities can be the same. Projects that are hyped up suddenly aren't mentioned, and presumably it's failed, but no-one [from the project] ever talks about why so we don't learn from those mistakes. 'Fail faster, succeed sooner'.
I'd like to hear suggestions from you on how we could deal with those challenges.

What are museums known for? Big buildings, full of stuff; experts; we make visitors come to us; we're known for being fun; or for being boring.

Museum websites traditionally appear to be about where we are, when we're open, what's on, is there a cafe on site. Which is useful, but we can do a lot more.

Traditionally we've done pretty exhibition microsites, which are nice – they provide an experience of the exhibition before or after your visit. They're quite marketing-led, they don't necessarily provide an equivalent experience and they don't really let you engage with the content beyond the fact that you're viewing it.

We're doing lots of collections online projects, some of these have ended up being silos – sometimes to the extent if we want to get data out of them, we have to screen-scrape our own data. These sites often aren't as pretty, they don't always have the same design and usability budgets (if any).

I think we should stick to what we're really good at – understanding the data (collections), understanding how to mediate it, how to interpret it, how to select things that are appropriate for publication, and maybe open it up to other people to do the shiny pretty things. [Sounds almost like I'm advocating doing myself out of a job!]

So we have lots of objects, images, lots of metadata; our collections databases also include people, events, dates, places, businesses and organisations, lots of qualified information around things like dates, they're not necessarily simple fields but that means they can convey a lot more meaning. I've included that because people don't always realise we have information beyond objects and object metadata. This slide [11 below] is an example of one of the challenges – this box of objects might not be catalogued as individual instruments, it might just be catalogued as a 'box of stuff', which doesn't help you find the interesting objects in the box. Lots of good stuff is hidden in this way.

We're slowly getting there. We're opening up access. We're using APIs internally to share data between gallery interactives and the web, we're releasing them as data points, we're using them to provide direct access to collections. At the moment it still tends to be quite mediated access, so you're getting a lot of interpretation and a fewer number of objects because of the resources required to create really nice records and the information around them.

'Read access' is relatively easy, 'write access' is harder because that's when we hit those institutional issues around authority, authorship. Some curators are vaguely horrified that they might have to listen to what the public have to say and actually take some of it back into their collections databases. But they also have to understand that they can't know everything about their collections, and there are some specialist users who will know everything there is to know about a particular widget on a particular kind of train. We'd like to capture that knowledge. [London Transport Museum have had a good go at that.]

Some random URLs of cool stuff happening in museums [http://dashboard.imamuseum.org/, http://www.powerhousemuseum.com/collection/database/menu.php, http://www.brooklynmuseum.org/opencollection/collections/, http://objectwiki.sciencemuseum.org.uk/] – it's still very much in small pockets, it's still difficult for museum staff to convince people to take what seems like a leap of faith and try these non-traditional things out.

We're taking our content to where people hang out. We're exploring things like Flickr Commons, asking people to tag and comment. Some museums have been updating collections records with information added by the public as a result. People are geo-tagging photos for us, which means you can do 'then and now' mashups without a big metadata enhancement budget.

I'd like to see an end to silos. We are kinda getting there but there's not a serious commitment to the idea that we need to let things go, that we need to make sure that collections online shareable, that they're interoperable, that they can mesh with other things.

Particularly for an education audience, we want to help researchers help themselves, to help developers help others. What else do we have that people might find useful?

What we can do depends on who you are. I could hope that things like enquiry-based learning, mashups, linked data, semantic web technologies, cross-collections searches, faceted browsing to make complex searches easy would be useful, that the concept of museums as a place where information lives – a happy home for metadata mapped around objects and authority records – are useful for people here but I wouldn't want to put words into your mouths.

There's a lot we can do with the technology, but if we're investing resources we need to make sure that they're useful. I can try things in my own time because it's fun, but if we're going to spend limited resources on interfaces for developers then we need to that it's actually going to help some group of people out there.

The philosophy that I'm working with is 'we've got really cool things, but we can have even cooler things if we can share what we have with everyone else'. "The coolest thing to do with your data will be thought of by someone else". [This quote turns out to be on the event t-shirts, via CRIG!] So that said… any ideas, comments, suggestions?"

And that, thankfully, is where I stopped blathering on. I'll summarise the discussion and post back when I've checked that people are ok with me blogging their comments.

[If the slide show below has a brown face on a black background, it's the right one – slideshare's embed seems to have had a hiccup. If it's not that, try viewing it online directly.]

Happy developers + happy museums = happy punters

View more presentations from miaridge.

[My slide images include the Easter Egg museum in Kolomyya, Ukraine and 'Laughter in Odd Places' event at the Museum of London.]

This is a quick dump of some of the text from an interview I did at the event, cos I managed to cover some stuff I didn't quite articulate in my talk:

[On challenges for museums:] We need to change institutional priorities to acknowledge the size of the online audience and the different levels of engagement that are possible with the online experience. Having talked to people here, museums also need to do a bit of a sell job in letting people know that we've changed and we're not just great big imposing buildings full of stuff.

[What are the most exciting developments in the museum sector, online?] For digital collections, going outside the walls of the museum using geo-location to place objects in their original context is amazing. It means you can overlay the streets of the city with past events and lives. Outsourcing curation and negotiating new models of expertise is exciting. Overcoming the fear of the digital surrogate as a competitor for museum visits and understanding that everything we do builds audiences, whether digital or physical.

Social Media Statistics

One of those totally brilliant and obvious-in-hindsight ideas. I'd like to see stronger guidelines on citing sources as it grows and clear differentiation by region/nation, because it's easy for vague figures and rumour to become universal 'fact', but it's a great idea and will hopefully grow: Social Media Statistics is:

A big home for all facts and figures around social media – because I'm fed up of trawling around for them and I'm also sure that I'm not the only one who gets asked 'how many users does Facebook have?' every hour of every day. … I'm hoping that this wiki will not only include usage stats, but also behaviour and attitude stats. It's a bit of a skeleton at the moment, with v few of my stats having stated sources, but be patient – and help where you can!

Please add in any juicy stats as you come across them, and do cite your references and link to them where possible.

I'll put my money where my mouth is and add information I find. I find wikis a really useful tool for lightweight documentation – it's really easy to add some information while it's in your brain, and the software doesn't get in the way of your flow.

For a while now I've wanted a repository of museum and cultural heritage audience evaluation – this could be a good model. Speaking of which, I really must write up my notes from the MCG Autumn meeting.

[Edit to add: Social Media Statistics also links to Measurementcamp, which might be of interest to cultural heritage organisations wondering how they can 'measure their social media communications online and offline' (and how they can work with project sponsors and funders to define suitable metrics for an APId, social media world).]

Next-generation approaches at 'UK Museums on the Web Conference 2008'

Session 3, 'Next-generation approaches', of the UK Museums on the Web Conference 2008 was introduced by Jon Pratty.

Jon questioned, 'what is a virtual museum?. It can be pretty much anything. Lots of valuable historical documents aren't in 'online museum', they're just out there to be found by search. It raises the question – how much permanence should digital objects have?'.

George Oates, 'Sharing museum collections through Flickr'
Introducing the Flickr Commons project and talking about some early results. Some practical information on what it means to join the program, and things that have come out of it.

Flickr 'swerved in from left field' and bumped into museum people and librarians and archivists.

It started with Library of Congress thinking about how to engage with Web 2.0. They were looking for a Web 2.0 partner. They have 14 million images, about a million digitised.

Flickr is designed specifically to search and browse photos. It has a big infrastructure and supports interfaces in 8 languages. It has lots of eyeballs – "it's made of people".

From the Commons point of view, it's simply a service, organisations can publish content into it.

They hit a hurdle: can a collecting institution publish content onto a site like Flickr? As collecting institution, someone like the Library of Congress doesn't necessarily own the copyright or know who the copyright holder was. They devised a new statement – 'no known copyright restrictions' – this provided a way to use this content once institution had done as much work as they could to trace copyright so they could still publish if not able to trace copyright holders.

Might open up to other sorts of content.

What's it for? Increase access to public photography collections; gather context about them, [something else I missed].

Powerhouse – lots of the collection was geo-tagged. It means you can find photos from then and now, for example around the CBD of Sydney. [Cool! I love the way geo-tagging content lets you build up layers of history]

Brooklyn – it made sense to use their existing established Flickr account, so Flickr created functionality to support that. The Smithsonian joined on Monday.

Soon they'll have content from other partners including a charming collection from a tiny local museum.

Results:
Last 28 days Library of Congress – 15,000 [or 50,000?] views per day, 8 million views over last six months, 72,000 tags.
Powerhouse – 77,000 views (more views of that collection in one month than in the whole previous year), 3500 tags.
Brooklyn – figures affected by merged account issue.
Smithsonian – 10,000 views in first day, 100 new contacts

The numbers are probably affected by the ratio of photos e.g. smaller numbers when an institution has put fewer photos online.

"But, is it any good"?"
Suddenly there are conversations between Flickr users and institutions, and between Flickr users, contributing information and identifications.

They contribute the identification of places and people, with information about the history behind photos.

Now and then – people are adding their recent photos of a location via comments on Flickr.

Library of Congress have made a list of types of interactions [slides], they include the transcription of text on signs, posters, etc in background, geo-tags, non-English tags.

Institutional context and Flickr – bind them together with hyperlink, but being on Flickr frees a program from institutional constraints.

Flickr has been designed as a vessel or platform where interactions and conversations can happen.

The information that the community provides is proving useful. The Library of Congress has updated 176 records in catalogue, recording that it's based on 'information provided by Flickr Commons Project 2008'.

The Smithsonian found it was opportunity for collaboration between institutions/departments and staff.

How to join: the process is publish – interact – feedback.

What to think about: give a broad representation of what's in your collection. Think about placement of images in photostream and sets. Plan to attract special interest groups. Think about what is already digital, what is popular? It can direct your digitisation efforts with feedback from a live community. Or you could go into your stores or collections database and possibly digitised randomly.

How much metadata to include? How many fields from database into description of photo; more or less?

When: can be a challenge for institutions.

How? You could use the normal Flickr uploadr if you don't have too many images; or you could use API to write applications that will work with Collections Management Systems.

Who? Might be web technician and curator.

The catch? It costs $24.95 for a Pro account. But you get unlimited storage, and could conceivably put whole collection online.

The future:
It's a work in progress. Probably will end up developing tools like additional reporting
Grow gently (make sure institution can handle the changes and respond to interactions)
They will continue their focus on photographs, not photographs of objects "(sorry)". "Flickr is about … empathic photography"
"Go local" e.g. small archives in little towns – people can still participate even if they don't have a web team, or web site.
API methods, RSS
Searching, browsing, maps
Search across Commons coming soon. Maybe combine searches to see a map of photos taken in 1910.

Notes from 'UK Museums on the Web Conference 2008'

I'm back in London after
UK Museums on the Web Conference 2008 and the mashed museum day.

In the interests of getting my notes up quickly I'm putting them up pretty much 'as is', so they're still rough around the edges. I'll add links to the speaker slides when they are all online. Some photos from the two days are online – a general search for ukmw08 on Flickr will find some. I have some in a set online now, others are still to come, including some photos of slides so I'll update this as I check the text from the slides. These are my notes from the first session.

The keynote speech was given by Tom Loosemore of Ofcom on the Future of Public Service Content.

[For context, Ofcom is the 'independent regulator and competition authority for the UK communications industries' and their recently second review of public service broadcasting, 'The Digital Opportunity', caused a stir in the digital cultural heritage world for its assessment of the extent to which public sector websites delivered on 'public service purposes and characteristics'. You can read the summary or download the full report.]

'How many of you are on the main board of your institution?'

Leadership doesn't have the vision in place to take advantage of the internet.

Sees the internet as platform for public service, [most importantly] enlightenment. He's here today to enlist our help.

We view the internet through lens of expectations from the past, definitely in public service broadcasting – 'let's get our programs on the internet'.

What is value for money?

Would that other sectors did the same soul searching

[On the Ofcom review:] 'You can't really review the web, it's bonkers'

Public service characteristics to create a report card. Of the public service characteristics in the online market (high quality, original, innovative, challenging, engaging, discoverable and accessible), 'challenging' is the hardest.

Museums and cultural sector have amazing potential. What are the barriers between the people here who get it and being able to take that opportunity and redefine public service broadcasting?

It's not skills. Maybe ten years ago, not today. And it's not technology. The crucial missing link is leadership and vision, the lack of recognition by people who govern direction of institutions of the huge potential.

[Which does translate into 'more resources', eventually, but perhaps the missing gap right now is curatorial/interpretative resources? Every online project we do generates more enquiries, stretching these people further, and they don't have time to proactively create content for ad hoc projects as it is, especially as their time tends to be allocated a long time in advance.]

What's behind that reluctance, what can you do to help people on your board understand the opportunities? We can ask 'what business are we in? what's the purpose of our institution?'.

Tate recognise they're not just in the business of getting people to go to the Tate venues, they're in the business of informing people about art. Compare that to the Royal Shakespeare Company which is using its online site purely to get bums on seats.

Next opportunity… how do you take opportunity to digitise your collections and reach a whole new audience? How can you make better use of cultural objects that were previously constrained by physicalty.

What opportunities are native to the internet, can only happen there? How can it help your institution to deliver its purpose?

Recognise that you are in the (public service) media business.

How do you measure enlightenment? You could be changing the way people see the world, etc. but you need to measure it to make a case, to know whether you're succeeding. Metrics really really matter in public service arena.

BBC used to look at page views, but developers gamed the system. Then the metric was 'time online', but it stopped people thinking externally. Metric as proxy for quality.

Value = reach x quality. What kind of experience did they have?

Quality is the really hard part. As defined by BBC: quality is in the eye of the beholder. Did the user have an excellent experience?

BBC measure 'net promoter' – how likely are you to recommend this to a friend or colleague, on a scale of 1 – 10?

[But for our sector, what if you don't have any friends with the same interest in x? Would people extrapolate from their specific page on a Roman buckle to recommend the site generally?]

Throw away the 'soggy British middle' – the 7, 8s (out of ten).

Group them as Promoters (9-10/10), Passive (7-8/10), Detractors (0 – 6/10). The key measure is the difference between how many Promoters and how many Detractors. This was 'fabulously useful' at the BBC. 30% is good benchmark.

They mapped whole BBC portfolio against 'net promoters' % and reach, bubbles show cost.

It's not necessarily about reaching mass audiences. But when producing for niche audiences – they must love it, and it shouldn't cost that much.

He's telling us this because it's the language of funders, of KPIs, this is hard evidence with real people. You might use a different measure of quality but you can't talk about opportunities in abstract, must have numbers behind them.

Suggested the BBC's 15 Web Principles, including 'fall forward, fast'.

A measure of personal success for him would be that in x years when he asked 'who here is on the board of your institution, at least x should put hands up'.

[I really liked this keynote speech as a kick up the arse in case we started to get too complacent about having figured out what matters to us, as museum geeks. It doesn't count unless we can get through our organisations and get that content out to audiences in ways they can use (and re-use).]

In linking the sessions, Ross Parry mused about the legacy of 18th, 19th century ideas of how to build a museum, how would they be different if museums were created today?

Lee Iverson, How does the web connect content? "Semantic Pragmatics"
'Profoundly disagreed' with some of the things Tom was talking about, wants to have a dialogue.
He asked how many know the background to semantic web stuff? Quite a few hands were raised.

Talking about how the web works now and where it's going. Museums have significant opportunity to push things forward, but must understand possibilities and limitations.

Changing classic relationship – museum websites as face of institution to users. Huge opportunity for federating and aggregating content (between museums) – an order of magnitude better.

He's working with 13 museums, with north west native American artefacts. Communities are co-developers, virtually repatriating their (land).

Possibility to connect outside the museum. Powerhouse Museum as an excellent example of why (and how) you should connect.

Becoming connected:
Expose own data from behind presentation layers
Find other data
Integrate – creating a cohesive (situation)
Engage with users

Access to data is core business, curatorial stuff.

RDFa
Pragmatics of standards – get a sense of what it is you're doing [and start, don't try and create the system of everything first], it'll never work. Use existing standards if possible, grab chunks if you can. Never standardise what you minimally need to do to get the utility you need at the moment. Then extend, layers, version 2. A standard is an agreement between a minimum of two people [and doesn't have to be more complicated than that].

"Just do it" – make agreements, get it to work, then engage in the standardisation process.

Relationship between this and semantic web? Semantic web as 'data web'. Competing definitions.

Slide on Tim Berners-Lee on the semantic web in 1999.

Why hasn't it appeared? It's vapourware, you can't make effective standards for it.

Syntax – capability of being interpreted. Semantic – ability to interpret, and to connect interpretations.

Finding data – how much easier would it be if we could just grab the data we want directly from where we want it?

Key is relating what you're doing to what they're doing.

XML vs RDF
Semantic web built on RDF, it's designed for representing metadata. It's substantially different to XML. Lots of reaction against RDF has been reaction against XML encoding, syntactic resistance.

RDF is designed to be manipulated as data, XML is about annotating text. In XML, syntax is the thing, with RDF the data is the thing.

Grab entire XML doc before you can figure out how to smoosh then together. RDF works by reference, you can just build on it.

RDFa. A way of embedding RDF content directly in XHTML, relies on same strategies as microformats. Will be ignored by presentation oriented systems but readable by RDF parsers.

[RDF triples vs machine tags? RDF vs microformats? How RDF-like is OAI PMH?]

You can talk about things you don't have a representation for e.g. people.

Ignore the term 'ontology' – it's just a way of talking about a vocabulary.

Four steps for widespread adoption:
Promote practical applications
Develop applications now
[and the slide was gone and I missed the last two steps!]

There was also some stuff on limitations of lightweight approaches, and hermetically sealed museum data, user experiences. Also a bit on 'give away structured data' but with a good awareness of the need to keep some data private – object location and value, for example.

Ross – we've had the media context and technical context, now for the sector context.

Paul Marty, Engaging Audiences by connecting to collections online.
Vital connections…

What does it mean to say x% of your collection is online? For whom is it useful?

How to engage audiences around your collections? Not just presenting information.

Goes beyond providing access to data. Research shows audiences want engagement. Surveyed 1200 museum visitors about their requirements. [I would love to see the research] Virtuous circle between museum visits and website visits.

Build on interest, give experience that grabs people.

Romans in Sussex website – multiple museums offering collections for multiple audiences. Re-presenting same content in different ways on the fly.

Audiences
Don't just give general public a list of stuff. Give them a way to engage.

"Engaging a community around a collection is harder than providing access to data about a collection"

Photo of the week – says "What do you know about this photo? Please share your thoughts with us" But no link or instructions on how to do it. But at least they're trying…

Discussion – Tom, Lee and Paul.

"Why do you digitise collections before had need in mind?" [Because the driver is internal, not external, needs, would be the generous answer; because they could get funding to do it would be my ungenerous answer].

Tom on RDF – how seriously engaged with it to build audiences, tell stories.

BBC licence terms – couldn't re-use data for commercial purposes/at all.

Leadership need to understand opportunities because otherwise they won't support geek stuff.

Qu: terms of engagement – how is it defined?

Paul – US has made same mistakes re digitisation of collections and websites that don't have reusable data.

Participants must be involved in process from the beginning, need input at start from intended users on how it can engage them.

Fiona: why not use existing resources, go to existing sites with established audiences?

Lee: how did YouTube succeed – people were brought by embedded content. [This issue of using 'wrappers' around your content to help it go viral by being embeddable elsewhere was raised in another session too.]

Tom: letting go is how you win, but it's a profound challenge to institutions and their desire to maintain authority.

Metrics and ROI for social software

A useful post about Social Media Metrics/Return on Investment with some thoughts on "how to provide useful metrics and measurements on the effects of social media for a nonprofit organization" and lots of useful links. It suggests "audience, engagement, loyalty, influence, and action" can put metrics in the "more holistic" context of outcomes, measures, strategy.