Notes on 'User Generated Content' session, Open Culture Conference 2010

My notes from the 'user generated content' parallel track on first day of the Open Culture 2010 conference. The session started with brief presentations by panellists, then group discussions at various tables on questions suggested by the organisers. These notes are quite rough, and of course any mistakes are mine. I haven't had a chance to look for the speakers' slides yet so inevitably some bits are missing, and I can only report the discussion at the table I was at in the break-out session. I've also blogged my notes from the plenary session of the Open Culture 2010 conference.

User-generated content session, Open Culture, Europeana – the benefits and challenges of UGC.
Kevin Sumption, User-generated content, a MUST DO for cultural institutions
His background – originally a curator of computer sciences. One of first projects he worked on at Powerhouse was D*Hub which presented design collections from V&A, Brooklyn Museum and Powerhouse Museum – it was for curators but also for general public with an interest in design. Been the source of innovation. Editorial crowd-sourcing approach and social tagging, about 8 years ago.

Two years ago he moved to National Maritime Museum, Royal Observatory, Greenwich. One of the first things they did was get involved with Flickr Commons – get historic photographs into public domain, get people involved in tagging. c1000 records in there. General public have been able to identify some images as Adam Villiers images – specialists help provide attribution for the photographer. Only for tens of records of the 000s but was a good introduction to power of UGC.

Building hybrid exhibition experiences – astronomy photographer of the year – competition on Flickr with real world exhibition for the winners of the competition. 'Blog' with 2000 amateur astronomers, 50 posts a day. Through power of Flickr has become a significant competition and brand in two years.

Joined citizen science consortia. Galaxy Zoo. Brainchild of Oxford – getting public engaged with real science online. Solar Stormwatch c 3000 people analysing and using the data. Many people who get involved gave up science in high school… but people are getting re-engaged with science *and* making meaningful contributions.

Old Weather – helping solve real-world problems with crowdsourcing. Launched two months ago.
Passion for UGC is based around where projects can join very carefully considered consortia, bringing historical datasets with real scientific problems. Can bring large interested public to the project. Many of the public are reconnecting with historical subject matter or sciences.

Judith Bensa-Moortgat, Nationaal Archief, Netherlands, Images for the Future project
Photo collection of more than 1 million photos. Images for the future project aims to save audio-visual heritage through digitisation and conservation of 1.2 million photos.

Once digitised, they optimise by adding metadata and context. Have own documentalists who can add metadata, but it would take years to go through it all. So decided to try using online community to help enrich photo collections. Using existing platforms like Wikipedia, Flickr, Open Street map, they aim to retrieve contextual info generated by the communities.  They donated political portraits to Wikimedia Commons and within three weeks more than half had been linked to relevant articles.

Their experiences with Flickr Commons – they joined in 2008. Main goal was to see if community would enrich their photos with comments and tags. In two weeks, they had 400,000 page views for 400 photos, including peaks when on Dutch TV news. In six months, they had 800 photos with over 1 million views. In Oct 2010, they are averaging 100,000 page views a month; 3 million overall.

But what about comments etc? Divided them into categories of comments [with percentage of overall contributions]:

  • factual info about location, period, people 5%; 
  • link to other sources eg Wikipedia 5%; 
  • personal stories/memories (e.g. someone in image was recognised); 
  • moral discussions; 
  • aesthetical discussions; 
  • translations.

The first two are most important for them.
13,000 tags in many languages (unique tags or total?).
10% of the contributed UGC was useful for contextualisation; tags ensure accessibility [discoverability?] on the web; increased (international) visibility. [Obviously the figures will vary for different projects, depending on what the original intent of the project was]

The issues she'd like to discuss are – copyright, moderation, platforms, community.

Mette Bom, 1001 Stories about Denmark
Story of the day is one of the 1001 stories. It's a website about the history and culture of Denmark. The stories have themes, are connected to a timeline.  Started with 50 themes, 180 expert writers writing the 1001 stories, now it's up to the public to comment and write their own stories. Broad definition of what heritage is – from oldest settlement to the 'porn street' – they wanted to expand the definition of heritage.

Target audiences – tourists going to those places; local dedicated experts who have knowledge to contribute. Wanted to take Danish heritage out of museums.

They've created the main website, mobile apps, widget for other sites, web service.  Launched in May 2010.  20,000 monthly users. 147 new places added, 1500 pictures added.

Main challenges – how to keep users coming back? 85% new, 15% repeat visitors (ok as aimed at tourists but would like more comments). How to keep press interested and get media coverage? Had a good buzz at the start cos of the celebrities. How to define participation? Is it enough to just be a visitor?

Johan Oomen, Netherlands Institute for Sound and Vision, Vrij Uni Amsterdam. Participatory Heritage: the case of the Waisda? video labelling game.
They're using game mechanisms to get people to help them catalogue content. [sounds familiar!]
'In the end, the crowd still rules'.
. Tagging is a good way to facilitate time-based annotation [i.e. tag what's on the screen at different times]

Goal of game is consensus between players. Best example in heritage is steve.museum; much of the thinking about using tagging as a game came from Games with a Purpose (gwap.com).  Basic rule – players score points when their tag exactly matches the tag entered by another within 10 seconds. Other scoring mechanisms.  Lots of channels with images continuously playing.

Linking it to twitter – shout out to friends to come join them playing.  Generating traffic – one of the main challenges. Altruistic message 'help the archive' 'improve access to collections' came out of research with users on messages that worked. Worked with existing communities.

Results, first six months – 44,362 pageviews. 340,000 tags to 604 items, 42,068 unique tags.
Matches – 42% of tags entered more than 2 times. Also looked at vocab (GTAA, Cornetto), 1/3 words were valid Dutch words, but only a few part of thesauruses.  Tags evaluated by documentalists. Documentary film 85% – tags were useful; for reality series (with less semantic density) tags less useful.

Now looking at how to present tags on the catalogue Powerhouse Museum style.  Experimenting with visualising terms, tag clouds when terms represented, also makes it easy to navigate within the video – would have been difficult to do with professional metadata.  Looking at 'tag gardening' – invite people to go back to their tags and click to confirm – e.g. show images with particular tags, get more points for doing it.

Future work – tag matching – synonyms and more specific terms – will get more points for more specific terms.

Panel overview by Costis Dallas, research fellow at Athena, assistant professor at Panteion University, Athens.
He wants to add a different dimension – user-generated content as it becomes an object for memory organisations. New body of resources emerging through these communication practices.
Also, we don't have a historiography anymore; memory resides in personal information devices.  Mashups, changes in information forms, complex composed information on social networks – these raise new problems for collecting – structural, legal, preservation in context, layered composition.  What do we need to do now in order to be able to make use of digital technologies in appropriate, meaningful ways in the future? New kinds of content, participatory curation are challenges for preservation.

Group discussion (breakout tables)
Discussion about how to attract users. [It wasn't defined whether it was how to attract specifically users who'll contribute content or just generally grow the audience and therefore grow the number of content creators within the usual proportions of levels of participation e.g. Nielsen, Forrester; I would also have liked to discussed how to encourage particular kinds of contributions, or to build architectures of participation that provided positive feedback to encourage deeper levels of participation.]

Discussion and conclusions included – go with the strengths of your collections e.g. if one particular audience or content-attracting theme emerges, go with it.  Norway has a national portal where people can add content. They held lots of workshops for possible content creators; made contact with specialist organisations [from which you can take the lesson that UGC doesn't happen in a vacuum, and that it helps to invest time and resources into enabling participants and soliciting content].  Recording living history.  Physical presence in gallery, at events, is important.  Go where audiences already are; use existing platforms.

Discussion about moderation included – once you have comments, how are they integrated back into collections and digital asset management systems?  What do you do about incorrect UGC displayed on a page?  Not an issue if you separate UGC from museum/authoritative content in the interface design.  In the discussion it turned out that Europeana doesn't have a definition of 'moderation'.  IMO, it should include community management, including acknowledging and thanking people for contributions (or rather, moderation is a subset of community management).  It also includes approving or reviewing and publishing content, dealing with corrections suggested by contributors, dealing with incorrect or offensive UGC, adding improved metadata back to collections repositories.

User-generated content and trust – British Library apparently has 'trusted communities' on their audio content – academic communities (by domain name?) and 'everyone else'.  Let other people report content to help weed out bad content.

Then we got onto a really interesting discussion of which country or culture's version of 'offensive' would be used in moderating content.  Having worked in the UK and the Netherlands, I know that what's considered a really rude swear word and what's common vocabulary is quite different in each country… but would there be any content left if you considered the lowest common standards for each country?  [Though thinking about it later, people manage to watch films and TV and popular music from other countries so I guess they can deal with different standards when it's in context.]  To take an extreme content example, a Nazi uniform as memorabilia is illegal in Germany (IIRC) but in the UK it's a fancy dress outfit for a member of the royal family.

Panel reporting back from various table discussions
Kevin's report – discussion varied but similar themes across the two tables. One – focus on the call to action, why should people participate, what's the motivation? How to encourage people to participate? Competitions suggested as one solution, media interest (especially sustained). Notion of core group who'll energise others. Small groups of highly motivated individuals and groups who can act as catalysts [how to recruit, reward, retain]. Use social media to help launch project.

1001 Danish Stories promotional video effectively showed how easy the process of contributing content was,  and that it doesn't have to to be perfect (the video includes celebrities working the camera [and also being a bit daggy, which I later realised was quite powerful – they weren't cool and aloof]).
Giving users something back – it's not a one-way process. Recognition is important. Immediacy too – if participating in a project, people want to see their contributions acknowledged quickly. Long approval processes lose people.
Removal of content – when different social, political backgrounds with different notions of censorship.

Mette's report – how to get users to contribute – answers mostly to take away the boundaries, give the users more credit than we otherwise tend to. We always think users will mess things up and experts will be embarrassed by user content but not the case. In 1001 they had experts correcting other experts. Trust users more, involve experts, ask users what they want. Show you appreciate users, have a dialouge, create community. Make it a part of life and environment of users. Find out who your users are.

Second group – how Europeana can use the content provided in all its forms. Could build web services to present content from different places, linking between different applications.
How to set up goals for user activity – didn't get a lot of answers but one possibility is to start and see how users contribute as you go along. [I also think you shouldn't be experimenting with UGC without some goal in mind – how else will you know if your experiment succeeded?  It also focusses your interaction and interface design and gives the user some parameters (much more useful than an intimidating blank page)].

Judith's report (including our table) – motivation and moderation in relation to Europeana – challenging as Europeana are not the owners of the material; also dealing with multilingual collections. Culturally-specific offensive comments. Definition and expectations of Europeana moderation. Resources need if Europeana does the moderation.
Incentives for moderation – improving data, idealism, helping with translations – people like to help translate.

Johan's report – rewards are important – place users in social charts or give them a feeling of contributing to larger thing; tap into existing community; translate physical world into digital analogue.
Institutional policy – need a clear strategy for e.g. how to integrate the knowledge into the catalogue. Provide training for staff on working with users and online tools. There's value in employing community managers to give people feedback when they leave content.
Using Amazon's Mechanical Turk for annotations…
Doing the projects isn't only of benefit in enriching metadata but also for giving insight into users – discover audiences with particular interests.

Costis commenting – if Europeana only has thumbnails and metadata, is it a missed opportunity to get UGC on more detailed content?

Is Europeana highbrow compared to other platforms like Flickr, FB, so would people be afraid to contribute? [probably – there must be design patterns for encouraging participation from audiences on museum sites, but we're still figuring out what they are]
Business model for crowdsourcing – producing multilingual resources is perfect case for Europeana.

Open to the floor for questions… Importance of local communities, getting out there, using libraries to train people. Local newspapers, connecting to existing communities.

Notes from Europeana's Open Culture Conference 2010

The Open Culture 2010 conference was held in Amsterdam on October 14 – 15. These are my notes from the first day (I couldn't stay for the second day). As always, they're a bit rough, and any mistakes are mine. I haven't had a chance to look for the speakers' slides yet so inevitably some bits are missing.  If you're in a hurry, the quote of the day was from Ian Davis: "the goal is not to build a web of data. The goal is to enrich lives through access to information".

The morning was MCd by Costis Dallas and there was a welcome and introduction from the chair of the Europeana Foundation before Jill Cousins (Europeana Foundation) provided an overview of Europeana. I'm sure the figures will be available online, but in summary, they've made good progress in getting from a prototype in 2008 to an operational service in 2010. [Though I have written down that they had 1 million visits in 2010, which is a lot less than a lot of the national museums in the UK though obviously they've had longer to establish a brand and a large percentage of their stats are probably in the 'visit us' areas rather than collections areas.]

Europeana is a super-aggregator, but doesn't show the role of the national or thematic aggregators or portals as providers/collections of content. They're looking to get away from a one-way model to the point where they can get data back out into different places (via APIs etc). They want to move away from being a single destination site to putting information where the user is, to continue their work on advocacy, open source code etc.

Jill discussed various trends, including the idea of an increased understanding that access to culture is the foundation for a creative economy. She mentioned a Kenneth Gilbraith [?] quote on spending more on culture in recession as that's where creative solutions come from [does anyone know the reference?]. Also, in a time of Increasing nationationalism, Europeana provided an example to combat it with example of trans-Euro cooperation and culture. Finally, customer needs are changing as visitors move from passive recipients to active participants in online culture.

Europeana [or the talk?] will follow four paths – aggregration, distribution, facilitation, engagement.

  • Aggregation – build the trusted source for European digital cultural material. Source curated content, linked data, data enrichment, multilinguality, persistent identifiers. 13 million objects but 18-20thC dominance; only 2% of material is audio-visual [?]. Looking towards publishing metadata as linked open data, to make Europeana and cultural heritage work on the web, e.g. of tagging content with controlled vocabularies – Vikings as tagged by Irish and Norwegian people – from 'pillagers' to 'loving fathers'. They can map between these vocabularies with linked data.
  • Distribution – make the material available to the user wherever they are, whenever they want it. Portals, APIs, widgets, partnerships, getting information into existing school systems.
  • Facilitate innovation in cultural heritage. Knowledge sharing (linked data), IPR business models, policy – advocacy and public domain, data provider agreements. If you write code based on their open sourced applications, they'd love you to commit any code back into Europeana. Also, look at Europeana labs.
  • Engagement – create dialogue and participation. [These slides went quickly, I couldn't keep up]. Examples of the Great War Archive into Europe [?]. Showing the European connection – Art Nouveau works across Europe.

The next talk was Liam Wyatt on 'Peace love and metadata', based in part on his experience at the British Museum, where he volunteered for a month to coordinate the relationship between Wikipedia as representative of the open web [might have mistyped that, it seems quite a mantle to claim] and the BM as representatiave of [missed it]. The goal was to build a proactive relationship of mutual benefit without requiring change in policies or practices of either. [A nice bit of realism because IMO both sides of the museum/Wikipedia relationship are resistant to change and attached firmly to parts of their current models that are in conflict with the other conglomeration.]

The project resulted in 100 new Wikipedia articles, mostly based on the BM/BBC A History of the World in 100 Objects project (AHOW). [Would love to know how many articles were improved as a result too]. They also ran a 'backstage pass' day where Wikipedians come on site, meet with curators, backstage tour, then they sit down and create/update entries. There were also one-on-one collaborators – hooking up Wikipedians and curators/museums with e.g. photos of objects requested.

It's all about improving content, focussing on personal relationshiips, leveraging the communities; it didn't focus on residents (his own work), none of them are content donation projects, every institution has different needs but can do some version of this.

[I'm curious about why it's about bringing Wikipedians into museums and not turning museum people into Wikipedians but I guess that's a whole different project and may be result from the personal relationships anyway.]

Unknown risks are accounted for and overestimated. Unknown rewards are not accounted for and underestimated. [Quoted for truth, and I think this struck a chord with the audience.]

Reasons he's heard for restricting digital access… Most common 'preserving the integrity of the collection' but sounds like need to approve content so can approve of usages. As a result he's seen convoluted copyright claims – it's easy tool to use to retain control.

Derivative works. Commercial use. Different types of free – freedom to use, freedom to study and apply knowledge gained; freedom to make and redistribute copies; [something else].

There are only three applicable licences for Wikipedia. Wikipedia is a non-commercial organisation, but don't accept any non-commercially licenced content as 'it would restrict the freedom of people downstream to re-use the content in innovative ways'. [but this rules out much museum content, whether rightly or not, and with varying sources from legal requirements to preference. Licence wars (see the open source movement) are boring, but the public would have access to more museum content on Wikipedia if that restriction was negotiable. Whether that would outweight the possible 'downstream' benefit is an interesting question.]

Liam asked the audience, do you have a volunteer project in your institution? do you have an e-volunteer program? Well, you do already, you just don't know it. It's a matter of whether you want to engage with them back. You don't have to, and it might be messy.

Wikipedia is not a social network. It is a social construction – it requires a community to exist but socialising is not the goal. Wikipedia is not user generated content. Wikipedia is community curated works. Curated, not only generated. Things can be edited or deleted as well as added [which is always a difficulty for museums thinking about relying on Wikipedia content in the long term, especially as the 'significance' of various objects can be a contested issue.]

Happy datasets are all alike; every unhappy dataset is unhappy in its own way. A good test of data is that it works well with others – technically or legally.

According to Liam, Europeana is the 21st century of the gallery painting – it's a thumbnail gallery but it could be so much more if the content was technically and legally able to be re-used, integrated.
Data already has enough restrictions already e.g. copyright, donor restrictions. but if it comes without restrictions, its a shame to add them. 'Leave the gate as you found it'.

'We're doing the same thing for the same reason for the same people in the same medium, let's do it together.'

The next sessions were 'tasters' of the three thematic tracks of the second part of the day – linked data, user-generated content, and risks and rewards. This was a great idea because I felt like I wasn't totally missing out on the other sessions.

Ian Davis from Talis talked about 'linked open culture' as a preview of the linked data track. How to take practices learned from linked data and apply them to open culture sector. We're always looking for ways to exchange info, communicate more effecively. We're no longer limited by the physicality of information. 'The semantic web fundamentally changes how information, machines and people are connected together'. The semantic web and its powerful network effects are enabling a radical transformation away from islands of data. One question is, does preservation require protection, isolation, or to copy it as widely as possible?

Conjecture 1 – data outlasts code. MARC stays forever, code changes. This implies that open data is more important than open source.
Conjecture 2 – structured data is more valuable than unstructured. Therefore we should seek to structure our data well.
Conjecture 3 – most of the value in our data will be unexpected and unintended. Therefore we should engineer for serendipity.

'Provide and enable' – UK National Archives phrase. Provide things you're good at – use unique expertise and knowledge [missed bits]… enable as many people as possible to use it – licence data for re-use, give important things identifiers, link widely.

'The goal is not to build a web of data. The goal is to enrich lives through access to information.'
[I think this is my new motto – it sums it up so perfectly. Yes, we carry on about the technology, but only so we can get it built – it's the means to an end, not the end itself. It's not about applying acronyms to content, it's about making content more meaningful, retaining its connection to its source and original context, making the terms of use clear and accessible, making it easy to re-use, encouraging people to make applications and websites with it, blah blah blah – but it's all so that more people can have more meaningful relationships with their contemporary and historical worlds.]

Kevin Sumption from the National Maritime Museum presented on the user-generated content track. A look ahead – the cultural sector and new models… User-generated content (UGC) is a broad description for content created by end users rather than traditional publishers. Museums have been active in photo-sharing, social tagging, wikipedia editing.

Crowdsourcing e.g. – reCAPTCHA [digitising books, one registration form at a time]. His team was inspired by the approach, created a project called 'Old Weather' – people review logs of WWI British ships to transcribe the content, especially meterological data. This fills in a gap in the meterological dataset for 1914 – 1918, allows weather in the period to be modelled, contributes to understanding of global weather patterns.

Also working with Oxford Uni, Rutherford Institute, Zooniverse – solar stormwatch – solar weather forecast. The museum is working with research institutions to provide data to solve real-world problems. [Museums can bring audiences to these projects, re-ignite interest in science, you can sit at home or on the train and make real contributions to on-going research – how cool is that?]

Community collecting. e.g. mass observation project 1937 – relaunched now and you can train to become an observer. You get a brief e.g. families on holidays.

BBC WW2 People's War – archive of WWII memories. [check it out]

RunCoCO – tools for people to set up community-lead, generated projects.

Community-lead research – a bit more contentious – e.g. Guardian and MPs expenses. Putting data in hands of public, trusting them to generate content. [Though if you're just getting people to help filter up interesting content for review by trusted sources, it's not that risky].

The final thematic track preview was by Charles Oppenheim from Loughborough University, on the risks and rewards of placing metadata and content on the web. Legal context – authorisation of copyright holder is required for [various acts including putting it on the web] unless… it's out of copyright, have explicit permission from rights holder (not implied licence just cos it's online), permission has been granted under licensing scheme, work has been created by a member of staff or under contract with IP assigned.

Issues with cultural objects – media rich content – multiple layers of rights, multiple rights holders, multiple permissions often required. Who owns what rights? Different media industries have different traditions about giving permission. Orphan works.

Possible non-legal ramifiations of IPR infringements – loss of trust with rights holders/creators; loss of trust with public; damage to reputation/bad press; breach of contract (funding bodies or licensors); additional fees/costs; takedown of content or entire service.

Help is at hand – Strategic Content Alliance toolkit [online].

Copyright less to do with law than with risk management – assess risks and work out how will minimise them.

Risks beyond IPR – defamation; liability for provision of inaccurate information; illegal materials e.g. pornography, pro-terrorism, violent materials, racist materials, Holocaust denial; data protection/privacy breaches; accidental disclosure of confidential information.

High risk – anything you make money from; copying anything that is in copyright and is commercially availabe.
Low risk – orphan works of low commercial value – letters, diaries, amateur photographs, films, recordings known by less known people.
Zero risk stuff.
Risks on the other side of the coin [aka excuses for not putting stuff up]