Frequently Asked Questions about crowdsourcing in cultural heritage

Over time I've noticed the repetition of various misconceptions and apprehensions about crowdsourcing for cultural heritage and digital history, so since this is a large part of my PhD topic I thought I'd collect various resources together as I work to answer some FAQs. I'll update this post over time in response to changes in the field, my research and comments from readers. While this is partly based on some writing for my PhD, I've tried not to be too academic and where possible I've gone for publicly accessible sources like blog posts rather than send you to a journal paywall.

If you'd rather watch a video than read, check out the Crowdsourcing Consortium for Libraries and Archives (CCLA)'s 'Crowdsourcing 101: Fundamentals and Case Studies' online seminar.

[Last updated: February 2016, to address 'crowdsourcing steals jobs'. Previous updates added a link to CCLA events, crowdsourcing projects to explore and a post on machine learning+crowdsourcing.]

What is crowdsourcing?

Definitions are tricky. Even Jeff Howe, the author of 'Crowdsourcing' has two definitions:

The White Paper Version: Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.

The Soundbyte Version: The application of Open Source principles to fields outside of software.

For many reasons, the term 'crowdsourcing' isn't appropriate for many cultural heritage projects but the term is such neat shorthand that it'll stick until something better comes along. Trevor Owens (@tjowens) has neatly problematised this in The Crowd and The Library:

'Many of the projects that end up falling under the heading of crowdsourcing in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor. … They are about inviting participation from interested and engaged members of the public [and] continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods'

Defining crowdsourcing in cultural heritage

To summarise my own thinking and the related literature, I'd define crowdsourcing in cultural heritage as an emerging form of engagement with cultural heritage that contributes towards a shared, significant goal or research area by asking the public to undertake tasks that cannot be done automatically, in an environment where the tasks, goals (or both) provide inherent rewards for participation.

Screenshot from 'Letters of 1916' project.

Who is 'the crowd'?

Good question!  One tension underlying the 'openness' of the call to participate in cultural heritage is the fact that there's often a difference between the theoretical reach of a project (i.e. everybody) and the practical reach, the subset of 'everybody' with access to the materials needed (like a computer and an internet connection), the skills, experience and time…  While 'the crowd' may carry connotations of 'the mob', in 'Digital Curiosities: Resource Creation Via Amateur Digitisation', Melissa Terras (@melissaterras) points out that many 'amateur' content creators are 'extremely self motivated, enthusiastic, and dedicated' and test the boundaries between 'between definitions of amateur and professional, work and hobby, independent and institutional' and quotes Leadbeater and Miller's 'The Pro-Am Revolution' on people who pursue an activity 'as an amateur, mainly for the love of it, but sets a professional standard'.

There's more and more talk of 'community-sourcing' in cultural heritage, and it's a useful distinction but it also masks the fact that nearly all crowdsourcing projects in cultural heritage involve a community rather than a crowd, whether they're the traditional 'enthusiasts' or 'volunteers', citizen historians, engaged audiences, whatever.  That said, Amy Sample Ward has a diagram that's quite useful for planning how to work with different groups. It puts the 'crowd' (people you don't know), 'network' (the community of your community) and 'community' (people with a relationship to your organisation) in different rings based on their closeness to you.

'The crowd' is differentiated not just by their relationship to your organisation, or by their skills and abilities, but their motivation for participating is also important – some people participate in crowdsourcing projects for altruistic reasons, others because doing so furthers their own goals.

I'm worried about about crowdsourcing because…

…isn't letting the public in like that just asking for trouble?

@lottebelice said she'd heard people worry that 'people are highly likely to troll and put in bad data/content/etc on purpose' – but this rarely happens. People worried about this with user-generated content, too, and while kids in galleries delight in leaving rude messages about each other, it's rare online.

It's much more likely that people will mistakenly add bad data, but a good crowdsourcing project should build any necessary data validation into the project. Besides, there are generally much more interesting places to troll than a cultural heritage site.

And as Matt Popke pointed out in a comment, 'When you have thousands of people contributing to an entry you have that many more pairs of eyes watching it. It's like having several hundred editors and fact-checkers. Not all of them are experts, but not all of them have to be. The crowd is effectively self-policing because when someone trolls an entry, somebody else is sure to notice it, and they're just as likely to fix it or report the issue'.  If you're really worried about this, an earlier post on Designing for participatory projects: emergent best practice' has some other tips.

 …doesn't crowdsourcing take advantage of people?

XKCD on the ethics of commercial crowdsourcing

Sadly, yes, some of the activities that are labelled 'crowdsourcing' do. Design competitions that expect lots of people to produce full designs and pay a pittance (if anything) to the winner are rightly hated. (See antispec.com for more and a good list of links).

But in cultural heritage, no. Museums, galleries, libraries, archives and academic projects are in the fortunate position of having interesting work that involves an element of social good, and they also have hugely varied work, from microtasks to co-curated research projects. Crowdsourcing is part of a long tradition of volunteering and altruistic participation, and to quote Owens again, 'Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage.'

[Update, May 2013: it turns out museums aren't immune from the dangers of design competitions and spec work: I've written On the trickiness of crowdsourcing competitions to draw some lessons from the Sydney Design competition kerfuffle.]

Anyway, crowdsourcing won't usually work if it's not done right. From A Crowd Without Community – Be Wary of the Mob:

"when you treat a crowd as disposable and anonymous, you prevent them from achieving their maximum ability. Disposable crowds create disposable output. Simply put: crowds need a sense of identity and community to achieve their potential."

…crowdsourcing can't be used for academic work

Reasons given include 'humanists don't like to share their knowledge' with just anyone. And it's possible that they don't, but as projects like Transcribe Bentham and Trove show, academics and other researchers will share the work that helps produce that knowledge. (This is also something I'm examining in my PhD. I'll post some early findings after the Digital Humanities 2012 conference in July).

Looking beyond transcription and other forms of digitisation, it's worth checking out Prism, 'a digital tool for generating crowd-sourced interpretations of texts'.

…it steals jobs

Once upon a time, people starting a career in academia or cultural heritage could get jobs as digitisation assistants, or they could work on a scholarly edition. Sadly, that's not the case now, but that's probably more to do with year upon year of funding cuts. Blame the bankers, not the crowdsourcers.

The good news? Crowdsourcing projects can create jobs – participatory projects need someone to act as community liaison, to write the updates that demonstrate the impact of crowdsourced contributions, to explain the research value of the project, to help people integrate it into teaching, to organise challenges and editathons and more.

What isn't crowdsourcing?

…'the wisdom of the crowds'?

Which is not just another way of saying 'crowd psychology', either (another common furphy). As Wikipedia puts it, 'the wisdom of the crowds' is based on 'diverse collections of independently-deciding individuals'. Handily, Trevor Owens has just written a post addressing the topic: Human Computation and Wisdom of Crowds in Cultural Heritage.

…user-generated content

So what's the difference between crowdsourcing and user-generated content? The lines are blurry, but crowdsourcing is inherently productive – the point is to get a job done, whether that's identifying people or things, creating content or digitising material.

Conversely, the value of user-generated content lies in the act of creating it rather than in the content itself – for example, museums might value the engagement in a visitor thinking about a subject or object and forming a response to it in order to comment on it. Once posted it might be displayed as a comment or counted as a statistic somewhere but usually that's as far as it goes.

And @sherah1918 pointed out, there's a difference between asking for assistance with tasks and asking for feedback or comments: 'A comment book or a blog w/comments isn't crowdsourcing to me … nor is asking ppl to share a story on a web form. That is a diff appr to collecting & saving personal histories, oral histories'.

…other things that aren't crowdsourcing:

[Heading inspired by Sheila Brennan @sherah1918]

  • Crowdfunding (it's often just asking for micro-donations, though it seems that successful crowdfunding projects have a significant public engagement component, which brings them closer to the concerns of cultural heritage organisations. It's also not that new. See Seventeenth-century crowd funding for one example.)
  • Data-mining social media and other content (though I've heard this called 'passive' or 'implict' crowdsourcing)
  • Human computation (though it might be combined with crowdsourcing)
  • Collective intelligence (though it might also be combined with crowdsourcing)
  • General calls for content, help or participation (see 'user-generated content') or vaguely asking people what they think about an idea. Asking for feedback is not crowdsourcing. Asking for help with your homework isn't crowdsourcing, as it only benefits you.
  • Buzzwords applied to marketing online. And as @emmclean said, "I think many (esp mkting) see "crowdsourcing" as they do "viral" – just happens if you throw money at it. NO!!! Must be great idea" – it must make sense as a crowdsourced task.

Ok, so what's different about crowdsourcing in cultural heritage?

For a start, the process is as valuable as the result. Owens has a great post on this, Crowdsourcing Cultural Heritage: The Objectives Are Upside Down, where he says:

'The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches… Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. … At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory … it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them'.

And as I've said elsewhere, ' playing [crowdsourcing] games with museum objects can create deeper engagement with collections while providing fun experiences for a range of audiences'. (For definitions of 'engagement' see The Culture and Sport Evidence (CASE) programme. (2011). Evidence of what works: evaluated projects to drive up engagement (PDF).)

What about cultural heritage and citizen science?

[This was written in 2012. I've kept it for historical reasons but think differently now.]

First, another definition. As Fiona Romeo writes, 'Citizen science projects use the time, abilities and energies of a distributed community of amateurs to analyse scientific data. In doing so, such projects further both science itself and the public understanding of science'. As Romeo points out in a different post, 'All citizen science projects start with well-defined tasks that answer a real research question', while citizen history projects rarely if ever seem to be based around specific research questions but are aimed more generally at providing data for exploration. Process vs product?

I'm still thinking through the differences between citizen science and citizen history, particularly where they meet in historical projects like Old Weather. Both citizen science and citizen history achieve some sort of engagement with the mindset and work of the equivalent professional occupations, but are the traditional differences between scientific and humanistic enquiry apparent in crowdsourcing projects? Are tools developed for citizen science suitable for citizen history? Does it make a difference that it's easier to take a new interest in history further without a big investment in learning and access to equipment?

I have a feeling that 'citizen science' projects are often more focused on the production of data as accurately and efficiently as possible, and 'citizen history' projects end up being as much about engaging people with the content as it is about content production. But I'm very open to challenges on this…

What kind of cultural heritage stuff can be crowdsourced?

I wrote this list of 'Activity types and data generated' over a year ago for my Masters dissertation on crowdsourcing games for museums and a subsequent paper for Museums and the Web 2011, Playing with Difficult Objects – Game Designs to Improve Museum Collections (which also lists validation types and requirements).  This version should be read in the light of discussion about the difference between crowdsourcing and user-generated content and in the context of things people can do with museums and with games, but it'll do for now:

Activity Data generated
Tagging (e.g. steve.museum, Brooklyn Museum Tag! You're It; variations include two-player 'tag agreement' games like Waisda?, extensions such as guessing games e.g. GWAP ESP Game, Verbosity, Tiltfactor Guess What?; structured tagging/categorisation e.g. GWAP Verbosity, Tiltfactor Cattegory) Tags; folksonomies; multilingual term equivalents; structured tags (e.g. 'looks like', 'is used for', 'is a type of').
Debunking (e.g. flagging content for review and/or researching and providing corrections). Flagged dubious content; corrected data.
Recording a personal story Oral histories; contextualising detail; eyewitness accounts.
Linking (e.g. linking objects with other objects, objects to subject authorities, objects to related media or websites; e.g. MMG Donald). Relationship data; contextualising detail; information on history, workings and use of objects; illustrative examples.
Stating preferences (e.g. choosing between two objects e.g. GWAP Matchin; voting on or 'liking' content). Preference data; subsets of 'highlight' objects; 'interestingness' values for content or objects for different audiences. May also provide information on reason for choice.
Categorising (e.g. applying structured labels to a group of objects, collecting sets of objects or guessing the label for or relationship between presented set of objects). Relationship data; preference data; insight into audience mental models; group labels.
Creative responses (e.g. write an interesting fake history for a known object or purpose of a mystery object.) Relevance; interestingness; ability to act as social object; insight into common misconceptions.

You can also divide crowdsourcing projects into 'macro' and 'micro' tasks – giving people a goal and letting them solve it as they prefer, vs small, well-defined pieces of work, as in the 'Umbrella of Crowdsourcing' at The Daily Crowdsource and there's a fair bit of academic literature on other ways of categorising and describing crowdsourcing.

Using crowdsourcing to manage crowdsourcing

There's also a growing body of literature on ecosystems of crowdsourcing activities, where different tasks and platforms target different stages of the process.  A great example is Brooklyn Museum’s ‘Freeze Tag!’, a game that cleans up data added in their tagging game. An ecosystem of linked activities (or games) can maximise the benefits of a diverse audience by providing a range of activities designed for different types of participant skills, knowledge, experience and motivations; and can encompass different levels of participation from liking, to tagging, finding facts and links.

A participatory ecosystem can also resolve some of the difficulties around validating specialist tags or long-form, more subjective content by circulating content between activities for validation and ranking for correctness, 'interestingness' (etc) by other players (see for example the 'Contributed data lifecycle' diagram on my MW2011 paper or the 'Digital Content Life Cycle' for crowdsourcing in Oomen and Aroyo's paper below). As Nina Simon said in The Participatory Museum, 'By making it easy to create content but impossible to sort or prioritize it, many cultural institutions end up with what they fear most: a jumbled mass of low-quality content'.  Crowdsourcing the improvement of cultural heritage data would also make possible non-crowdsourcing engagement projects that need better content to be viable.

See also Raddick, MJ, and Georgia Bracey. 2009. “Citizen Science: Status and Research Directions for the Coming Decade” on bridging between old and new citizen science projects to aid volunteer retention, and Nov, Oded, Ofer Arazy, and David Anderson. 2011. “Dusting for Science: Motivation and Participation of Digital Citizen Science Volunteers” on creating 'dynamic contribution environments that allow volunteers to start contributing at lower-level granularity tasks, and gradually progress to more demanding tasks and responsibilities'.

What does the future of crowdsourcing hold?

Platforms aimed at bootstrapping projects – that is, getting new projects up and running as quickly and as painlessly as possible – seem to be the next big thing. Designing tasks and interfaces suitable for mobile and tablets will allow even more of us to help out while killing time. There's also a lot of work on the integration of machine learning and human computation; my post 'Helping us fly? Machine learning and crowdsourcing' has more on this.

Find out how crowdsourcing in cultural heritage works by exploring projects

Spend a few minutes with some of the projects listed in Looking for (crowdsourcing) love in all the right places to really understand how and why people participate in cultural heritage crowdsourcing.

Where can I find out more? (AKA, a reading list in disguise)

There's a lot of academic literature on all kinds of aspects of crowdsourcing, but I've gone for sources that are accessible both intellectually and in terms of licensing. If a key reference isn't there, it might be because I can't find a pre-print or whatever outside a paywall – let me know if you know of one!

9781472410221Liked this post? Buy the book! 'Crowdsourcing Our Cultural Heritage' is available through Ashgate or your favourite bookseller…

Thanks, and over to you!

Thanks to everyone who responded to my call for their favourite 'misconceptions and apprehensions about crowdsourcing (esp in history and cultural heritage)', and to those who inspired this post in the first place by asking questions in various places about the negative side of crowdsourcing.  I'll update the post as I hear of more, so let me know your favourites.  I'll also keep adding links and resources as I hear of them.

You might also be interested in: Notes from 'Crowdsourcing in the Arts and Humanities' and various crowdsourcing classes and workshops I've run over the past few years.

Museums and the audience comments paradox

I was at the Imperial War Museum for an advisory board meeting for the Social Interpretation project recently, and had a chance to reflect on my experiences with previous audience participation projects.  As Claire Ross summarised it, the Social Interpretation project is asking: does applying social media models to collections successfully increase engagement and reach?  And what forms of moderation work in that environment – can the audience be trusted to behave appropriately?

One topic for discussion yesterday was whether the museum should do some 'gardening' on the comments.  Participation rates are relatively high but some of the comments are nonsense ('asdf'), repetitive (thousands of variants of 'Cool' or 'sad') or off-topic ('I like the museum') – a pattern probably common to many museum 'have your say' kiosks.  Gardening could involve 'pruning' out comments that were not directly relevant to the question asked in the interactive, or finding ways to surface the interesting comments.  While there are models available in other sectors (e.g. newspapers), I'm excited by the possibility that the Social Interpretation project might have a chance to address this issue for museums.

A big design challenge for high-traffic 'have your say' interactives is providing a quality experience for the audience who is reading comments – they shouldn't have to wade through screens of repeated, vacuous or rude comments to find the gems – while appropriately respecting the contribution and personal engagement of the person who left the comment.

In the spirit of 'have your say', what do you think the solution might be?  What have you tried (successfully or not) in your own projects, or seen working well elsewhere?

Update: the Social Interpretation have posted I iz in ur xhibition trolling ur comments:

"One of the most discussed issues was about what we have termed ‘gardening comments’ but to put it bluntly it’s more a case of should we be ‘curating the visitor voice’ in order to improve the visitor experience? It’s a difficult question to deal with… 

We are at the stage where we really do want to respect the commenter, but also want to give other readers a high value experience. It’s a question of how we do that, and will it significantly change the project?"

If you found this post, you might also be interested in Notes from 'The Shape of Things: New and emerging technology-enabled models of participation through VGC'.

Update, March 2014: I've just been reading a journal article on 'Normative Influences on Thoughtful Online Participation'. The authors set out to test this hypothesis:

'Individuals exposed to highly thoughtful behavior from others will be more thoughtful in their own online comment contributions than individuals exposed to behavior exhibiting a low degree of thoughtfulness.' 

Thoughtful comments were defined by the number of words, how many seconds it took to write them, and how much of the content was relevant to the issue discussed in the original post. And the results? 'We found significant effects of social norm on all three measures related to participants’ commenting behavior. Relative to the low thoughtfulness condition, participants in the high thoughtfulness condition contributed longer comments, spent more time writing them, and presented more issue-relevant thoughts.' To me, this suggests that it's worth finding ways to highlight the more thoughtful comments (and keeping pulling out those 'asdf' weeds) in an interactive as this may encourage other thoughtful comments in turn.

Reference: Sukumaran, Abhay, Stephanie Vezich, Melanie McHugh, and Clifford Nass. “Normative Influences on Thoughtful Online Participation.” In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems, 3401–10. Vancouver, BC, Canada: ACM, 2011. http://dl.acm.org/citation.cfm?id=1979450.

It's Backup Saturday!

Ironically (?), the original image is no longer available

If Backup Saturday is too casual, call it Digital Preservation Saturday. Whatever you call it, it's time to do some digital housekeeping.

This post is an attempt to reduce the number of sad status updates or requests for help I see when people have lost years of personal photos, contacts or calendars when their laptop or phone died or was stolen, or when people can't recover that vital document for their research or tax return… There's never a perfect time to do it, so just back up your files now. Phones and laptops are particularly easy to lose and are more likely to have precious photos or important documents, so start with them.

If you don't have an external hard drive order one online and in the meantime, burn to a CD or DVD or copy files to a USB stick.  There's no harm in having lots of copies (barring confusion over different versions of docs), so if you want to be really careful, swap external drives with a friend so you've each got an off-site copy of your most important files.  Use online services like Dropbox (my referral link, non-referral link) etc to keep files on your computer backup up online, but don't rely on them alone.  (The referral links give us each extra storage, which is nice.)

Backup email

Things change all the time so always check for more recent advice (this goes for everything on the page), but this article covers some good options for backing up Gmail (or try GMVault) and here's information on backing up Thunderbird, and try this if you're stuck on Outlook. I download an old Yahoo account to Thunderbird via POP mail, which might be the easiest way to deal with YMail and Hotmail.

While you're at it, back up your profile or preferences for your web browser – it's amazing how much information is stored in your browser history, bookmarks, etc. You can access saved passwords in Firefox and other browsers – obviously saving screenshots of the screen is a security risk but it can also help you remember older passwords if you're locked out of software.

Backup social media

'Turbulence' seems to be the IT trend for this decade (and maybe every decade), so it's a good idea to regularly back up whatever social media sites you rely on.  I haven't tried services like Backupify (more info) – if you've got experience with them, let me know in the comments.  Check back over your registration emails to remind yourself which services you've signed up for and use that as a checklist.

Services that backup tweets and other social media come and go (like Twapperkeeper and Twitoaster), so it's a good idea to not only choose services that let you easily export your archive, but also to put a monthly note in your calender to go in and actually run the export.  Saved copies of web pages might not work later, so a really low-tech solution is to copy all the text in a page and dump it into a text file or e.g Word document.  I use SearchHash to archive hashtags, but you have to get in quickly as the Twitter API often only provides access to the past few days' tweets.  You can also archive tweets via Google spreadsheets.

You can download your data from Facebook via the 'Download a copy of your Facebook data' on your settings page – it's not perfect, but again, it's better than nothing.  While Flickr is a good option for backing up images, you might also want to save the tags and comments that live on Flickr.  There are a number of tools for backing up Flickr, try these or these to start with.

Backup websites

Most blogs will let you export your posts, but the exported file isn't usually 'human-readable' until you've imported it into another blog, and there's always a chance that you'll lose some information.

An option that works well on all kinds of websites is HTTrack – I've used it for archiving sites and the results are good – it creates a locally-browseable static version of your site, preserving content and layouts.  This isn't the same as backing up your code or databases, but if you're at that point I assume you know how to backup these yourself. Bonus points if you've tested restoring from backups to check that the process actually works!

You can also add links to the Internet Archive (and while you're at it, why not make a donation?).

Backup devices

You can back up Apple products like iPods, iPhones, iPads with iTunes, but it doesn't hurt to download photos etc into other folders too – both MacOS and Windows have system apps that will download photos when you plug in the device – 'Image Capture' on my Mac and an Explorer window on my PC.

Nokia phones can be backed-up with Nokia PC Suite on Windows or iSync on MacOS (can be tricky). I've used SMS to Text on Android – it saved a file to my phone's disk, then I copied it over to my computer.

Backup other specialist software

Whatever you do, you probably use specialist software.  If you use reference management software, back it up!  Here are instructions for backing up EndNote, Mendeley and Zotero to get you started…

More digital housekeeping…

If you've made it this far, why not check that your anti-virus software is up-to-date, and run a deep scan?  If you haven't got anti-virus software, get some now – MoneySavingExpert has a useful guide to Free Antivirus Software. And speaking of money, if your bank doesn't keep all your bank statements online, or you're about to change chards, it's a good time to download your bank statements.

And if you've already done all that, why not offer to help a friend get their backup and anti-virus sorted?

Notes from a preview of the updated Historypin

The tl;dr version: inspiring project, great enhancements; yay!

Longer version: last night I went to the offices of We Are What We Do for a preview of the new version of HistoryPin. Nick Poole has already written up his notes, so I'm just supplementing them with my own notes from the event (and a bit from conversations with people there and the reading I'd already done for my PhD).

Screenshot with photo near WAWWD office (current site)

Historypin is about bridging the intergenerational divide, about mass participation and access to history, about creating social capital in neighbourhoods, conserving and opening up global archival resources (at this stage that's photographs, not other types of records).  There's a focus on events and activities in local communities. [It'd be great to get kids to do quick oral history interviews as they worked with older people, though I think they're doing something like it already.]

New features will include a lovely augmented reality-style view in streetview; the ability to upload and explore video as well as images; a focus on telling stories – 'tours' let you bring a series of photos together into a narrative (the example was 'the arches of New York', most of which don't exist anymore).  You can also create 'collections', which will be useful for institutions.  They'll also be available in the mobile apps (and yes, I did ask about the possibility of working with the TourML spec for mobile tours).

The mobile apps let you explore your location, explore the map and contribute directly from your phone.  You can use the augmented reality view to overlap old photos onto your camera view so that you can take a modern version of an old photo. This means they can crowdsource better modern images than those available in streetview as well as getting indoors shots.  This could be a great treasure hunt activity for local communities or tourists.  You can also explore collections (as slideshows?) in the app.

They're looking to work with more museums and archives and have been working on a community history project with Reading Museum.  Their focus on inclusion is inspiring, and I'll be interested to see how they work to get those images out into the community.  While there are quite a few 'then and now' projects focused on geo-locating old images around I think that just shows that it's an accessible way of helping people make connections between their lives and those in the past.

A quick correction to Nick's comments – the Historypin API doesn't exist yet, so if you have ideas for what it should do, it's probably a good time to get in touch.  I'll be thinking hard about how it all relates to my PhD, especially if they're making some of the functionality available.

Notes on 'User Generated Content' session, Open Culture Conference 2010

My notes from the 'user generated content' parallel track on first day of the Open Culture 2010 conference. The session started with brief presentations by panellists, then group discussions at various tables on questions suggested by the organisers. These notes are quite rough, and of course any mistakes are mine. I haven't had a chance to look for the speakers' slides yet so inevitably some bits are missing, and I can only report the discussion at the table I was at in the break-out session. I've also blogged my notes from the plenary session of the Open Culture 2010 conference.

User-generated content session, Open Culture, Europeana – the benefits and challenges of UGC.
Kevin Sumption, User-generated content, a MUST DO for cultural institutions
His background – originally a curator of computer sciences. One of first projects he worked on at Powerhouse was D*Hub which presented design collections from V&A, Brooklyn Museum and Powerhouse Museum – it was for curators but also for general public with an interest in design. Been the source of innovation. Editorial crowd-sourcing approach and social tagging, about 8 years ago.

Two years ago he moved to National Maritime Museum, Royal Observatory, Greenwich. One of the first things they did was get involved with Flickr Commons – get historic photographs into public domain, get people involved in tagging. c1000 records in there. General public have been able to identify some images as Adam Villiers images – specialists help provide attribution for the photographer. Only for tens of records of the 000s but was a good introduction to power of UGC.

Building hybrid exhibition experiences – astronomy photographer of the year – competition on Flickr with real world exhibition for the winners of the competition. 'Blog' with 2000 amateur astronomers, 50 posts a day. Through power of Flickr has become a significant competition and brand in two years.

Joined citizen science consortia. Galaxy Zoo. Brainchild of Oxford – getting public engaged with real science online. Solar Stormwatch c 3000 people analysing and using the data. Many people who get involved gave up science in high school… but people are getting re-engaged with science *and* making meaningful contributions.

Old Weather – helping solve real-world problems with crowdsourcing. Launched two months ago.
Passion for UGC is based around where projects can join very carefully considered consortia, bringing historical datasets with real scientific problems. Can bring large interested public to the project. Many of the public are reconnecting with historical subject matter or sciences.

Judith Bensa-Moortgat, Nationaal Archief, Netherlands, Images for the Future project
Photo collection of more than 1 million photos. Images for the future project aims to save audio-visual heritage through digitisation and conservation of 1.2 million photos.

Once digitised, they optimise by adding metadata and context. Have own documentalists who can add metadata, but it would take years to go through it all. So decided to try using online community to help enrich photo collections. Using existing platforms like Wikipedia, Flickr, Open Street map, they aim to retrieve contextual info generated by the communities.  They donated political portraits to Wikimedia Commons and within three weeks more than half had been linked to relevant articles.

Their experiences with Flickr Commons – they joined in 2008. Main goal was to see if community would enrich their photos with comments and tags. In two weeks, they had 400,000 page views for 400 photos, including peaks when on Dutch TV news. In six months, they had 800 photos with over 1 million views. In Oct 2010, they are averaging 100,000 page views a month; 3 million overall.

But what about comments etc? Divided them into categories of comments [with percentage of overall contributions]:

  • factual info about location, period, people 5%; 
  • link to other sources eg Wikipedia 5%; 
  • personal stories/memories (e.g. someone in image was recognised); 
  • moral discussions; 
  • aesthetical discussions; 
  • translations.

The first two are most important for them.
13,000 tags in many languages (unique tags or total?).
10% of the contributed UGC was useful for contextualisation; tags ensure accessibility [discoverability?] on the web; increased (international) visibility. [Obviously the figures will vary for different projects, depending on what the original intent of the project was]

The issues she'd like to discuss are – copyright, moderation, platforms, community.

Mette Bom, 1001 Stories about Denmark
Story of the day is one of the 1001 stories. It's a website about the history and culture of Denmark. The stories have themes, are connected to a timeline.  Started with 50 themes, 180 expert writers writing the 1001 stories, now it's up to the public to comment and write their own stories. Broad definition of what heritage is – from oldest settlement to the 'porn street' – they wanted to expand the definition of heritage.

Target audiences – tourists going to those places; local dedicated experts who have knowledge to contribute. Wanted to take Danish heritage out of museums.

They've created the main website, mobile apps, widget for other sites, web service.  Launched in May 2010.  20,000 monthly users. 147 new places added, 1500 pictures added.

Main challenges – how to keep users coming back? 85% new, 15% repeat visitors (ok as aimed at tourists but would like more comments). How to keep press interested and get media coverage? Had a good buzz at the start cos of the celebrities. How to define participation? Is it enough to just be a visitor?

Johan Oomen, Netherlands Institute for Sound and Vision, Vrij Uni Amsterdam. Participatory Heritage: the case of the Waisda? video labelling game.
They're using game mechanisms to get people to help them catalogue content. [sounds familiar!]
'In the end, the crowd still rules'.
. Tagging is a good way to facilitate time-based annotation [i.e. tag what's on the screen at different times]

Goal of game is consensus between players. Best example in heritage is steve.museum; much of the thinking about using tagging as a game came from Games with a Purpose (gwap.com).  Basic rule – players score points when their tag exactly matches the tag entered by another within 10 seconds. Other scoring mechanisms.  Lots of channels with images continuously playing.

Linking it to twitter – shout out to friends to come join them playing.  Generating traffic – one of the main challenges. Altruistic message 'help the archive' 'improve access to collections' came out of research with users on messages that worked. Worked with existing communities.

Results, first six months – 44,362 pageviews. 340,000 tags to 604 items, 42,068 unique tags.
Matches – 42% of tags entered more than 2 times. Also looked at vocab (GTAA, Cornetto), 1/3 words were valid Dutch words, but only a few part of thesauruses.  Tags evaluated by documentalists. Documentary film 85% – tags were useful; for reality series (with less semantic density) tags less useful.

Now looking at how to present tags on the catalogue Powerhouse Museum style.  Experimenting with visualising terms, tag clouds when terms represented, also makes it easy to navigate within the video – would have been difficult to do with professional metadata.  Looking at 'tag gardening' – invite people to go back to their tags and click to confirm – e.g. show images with particular tags, get more points for doing it.

Future work – tag matching – synonyms and more specific terms – will get more points for more specific terms.

Panel overview by Costis Dallas, research fellow at Athena, assistant professor at Panteion University, Athens.
He wants to add a different dimension – user-generated content as it becomes an object for memory organisations. New body of resources emerging through these communication practices.
Also, we don't have a historiography anymore; memory resides in personal information devices.  Mashups, changes in information forms, complex composed information on social networks – these raise new problems for collecting – structural, legal, preservation in context, layered composition.  What do we need to do now in order to be able to make use of digital technologies in appropriate, meaningful ways in the future? New kinds of content, participatory curation are challenges for preservation.

Group discussion (breakout tables)
Discussion about how to attract users. [It wasn't defined whether it was how to attract specifically users who'll contribute content or just generally grow the audience and therefore grow the number of content creators within the usual proportions of levels of participation e.g. Nielsen, Forrester; I would also have liked to discussed how to encourage particular kinds of contributions, or to build architectures of participation that provided positive feedback to encourage deeper levels of participation.]

Discussion and conclusions included – go with the strengths of your collections e.g. if one particular audience or content-attracting theme emerges, go with it.  Norway has a national portal where people can add content. They held lots of workshops for possible content creators; made contact with specialist organisations [from which you can take the lesson that UGC doesn't happen in a vacuum, and that it helps to invest time and resources into enabling participants and soliciting content].  Recording living history.  Physical presence in gallery, at events, is important.  Go where audiences already are; use existing platforms.

Discussion about moderation included – once you have comments, how are they integrated back into collections and digital asset management systems?  What do you do about incorrect UGC displayed on a page?  Not an issue if you separate UGC from museum/authoritative content in the interface design.  In the discussion it turned out that Europeana doesn't have a definition of 'moderation'.  IMO, it should include community management, including acknowledging and thanking people for contributions (or rather, moderation is a subset of community management).  It also includes approving or reviewing and publishing content, dealing with corrections suggested by contributors, dealing with incorrect or offensive UGC, adding improved metadata back to collections repositories.

User-generated content and trust – British Library apparently has 'trusted communities' on their audio content – academic communities (by domain name?) and 'everyone else'.  Let other people report content to help weed out bad content.

Then we got onto a really interesting discussion of which country or culture's version of 'offensive' would be used in moderating content.  Having worked in the UK and the Netherlands, I know that what's considered a really rude swear word and what's common vocabulary is quite different in each country… but would there be any content left if you considered the lowest common standards for each country?  [Though thinking about it later, people manage to watch films and TV and popular music from other countries so I guess they can deal with different standards when it's in context.]  To take an extreme content example, a Nazi uniform as memorabilia is illegal in Germany (IIRC) but in the UK it's a fancy dress outfit for a member of the royal family.

Panel reporting back from various table discussions
Kevin's report – discussion varied but similar themes across the two tables. One – focus on the call to action, why should people participate, what's the motivation? How to encourage people to participate? Competitions suggested as one solution, media interest (especially sustained). Notion of core group who'll energise others. Small groups of highly motivated individuals and groups who can act as catalysts [how to recruit, reward, retain]. Use social media to help launch project.

1001 Danish Stories promotional video effectively showed how easy the process of contributing content was,  and that it doesn't have to to be perfect (the video includes celebrities working the camera [and also being a bit daggy, which I later realised was quite powerful – they weren't cool and aloof]).
Giving users something back – it's not a one-way process. Recognition is important. Immediacy too – if participating in a project, people want to see their contributions acknowledged quickly. Long approval processes lose people.
Removal of content – when different social, political backgrounds with different notions of censorship.

Mette's report – how to get users to contribute – answers mostly to take away the boundaries, give the users more credit than we otherwise tend to. We always think users will mess things up and experts will be embarrassed by user content but not the case. In 1001 they had experts correcting other experts. Trust users more, involve experts, ask users what they want. Show you appreciate users, have a dialouge, create community. Make it a part of life and environment of users. Find out who your users are.

Second group – how Europeana can use the content provided in all its forms. Could build web services to present content from different places, linking between different applications.
How to set up goals for user activity – didn't get a lot of answers but one possibility is to start and see how users contribute as you go along. [I also think you shouldn't be experimenting with UGC without some goal in mind – how else will you know if your experiment succeeded?  It also focusses your interaction and interface design and gives the user some parameters (much more useful than an intimidating blank page)].

Judith's report (including our table) – motivation and moderation in relation to Europeana – challenging as Europeana are not the owners of the material; also dealing with multilingual collections. Culturally-specific offensive comments. Definition and expectations of Europeana moderation. Resources need if Europeana does the moderation.
Incentives for moderation – improving data, idealism, helping with translations – people like to help translate.

Johan's report – rewards are important – place users in social charts or give them a feeling of contributing to larger thing; tap into existing community; translate physical world into digital analogue.
Institutional policy – need a clear strategy for e.g. how to integrate the knowledge into the catalogue. Provide training for staff on working with users and online tools. There's value in employing community managers to give people feedback when they leave content.
Using Amazon's Mechanical Turk for annotations…
Doing the projects isn't only of benefit in enriching metadata but also for giving insight into users – discover audiences with particular interests.

Costis commenting – if Europeana only has thumbnails and metadata, is it a missed opportunity to get UGC on more detailed content?

Is Europeana highbrow compared to other platforms like Flickr, FB, so would people be afraid to contribute? [probably – there must be design patterns for encouraging participation from audiences on museum sites, but we're still figuring out what they are]
Business model for crowdsourcing – producing multilingual resources is perfect case for Europeana.

Open to the floor for questions… Importance of local communities, getting out there, using libraries to train people. Local newspapers, connecting to existing communities.

Soliciting conversation and listening actively while isolating discussion

I've been paying more attention to The Age's "what's on" listings and reviews while I'm actually in Melbourne, and noticed that their film critic, Jim Schembri, is doing a fine job soliciting responses on his film reviews.  At the end of a piece on 'Bruno: Comic genius or witless git?', he asks:

What do you think? Is Bruno funny? Half funny? Not funny? What do you think of Sacha Baron Cohen? Do you agree with anything in this article? Does the author make any valid points? Is there skill involved in this brand of comedy? Or is he a middle-aged fud who just doesn't get reality humour?

What do you think of the Shock and Guffaw School of Comedy? Should ethics factor in to it? Or are the laughs worth it, whatever the cost?

And what did you think of the saturation Bruno media blitz? Did you enjoy it? Or was it a case of "enough already"?

What is you favourite Sacha Baron Cohen moment? Is there a scene from his films or TV shows that make you laugh every time you think of it?

And if you had to choose between Bruno, Borat or Ali G, who would you most take to: (1) a wedding? (2) a funeral? (3) a kid's birthday party?

Your valued thoughts are hereby sought.

These direct questions are a good attempt at provoking discussion. I'm never sure how well specific questions soliciting audience response work, and in this case I'm not sure what prompted them – does it lead to a more constructive discussion? Reduce flame wars or trolling? Your valued thoughts are hereby sought.

But this is the best bit, and the point I'd like to make to museum bloggers – he also responds to comments:

The design is subtly clever, in that the blog author's responses appear inline, but are distinguished from audience comments with a heavier typeface. They're also attributed differently – "Schembri note" versus the 'Posted by blah on blah at blah'. This provides a level of authority while allowing direct responses to specific comments. I'm not sure how he'd respond to a bunch of similar comments – does it work if it appears as a separate comment? Would it display differently?

It's a great example of starting a discussion and actually sticking around to listen to the results – it turns a blog post into a conversation.

The other interesting point is that there's a very similar piece of content by the same author, Borat's bro is fully sick in the film section of the 'main' site, and the sub-heading makes it sound like it's also a participatory piece – "Bruno: a comic genius or a witless git? You be the judge" – but it's not.  And there are no links to the blog piece, so at a guess the majority of readers would never know they could comment on the film.  Effectively, the discussion is isolated from the main site, the general reader.  I can think of a few reasons why this might be the case, but a more interesting question might be – what effect does this have?

I'm still thinking this through (particularly in relation to cultural heritage and social media) – your thoughts would be welcome in the meantime.

'Shownar: reflecting online buzz around BBC programmes' [read: museum objects]

Call me mildly obsessive (sad, even), but I got really excited when I read this and mentally replaced 'BBC programme' with 'museum object'. From the BBC Internet Blog:

Today sees the launch of Shownar; a new prototype from BBC Vision which aims
to track online buzz around BBC TV and radio programmes and reflect it back in
useful and interesting ways, aiding programme discovery and providing onward
journeys to discussion about those programmes on the wider web.

Shownar aims to track the wealth of activity that takes place around BBC progammes online and work out which are currently gaining the most attention.

So, how does it work? In the first instance, we decided to focus on tracking in-bound links to programme-related pages on bbc.co.uk, so we could be confident that the discussions were actually about a BBC programme … We took a look at a range of possible suppliers, and for this initial prototype chose data provided by Yahoo! Search BOSS, Nielson Online's BlogPulse (which indexes over 100 million blogs), and Twingly (which searches microblogging services like Twitter, Jaiku and Identi.ca for links, even when they are shortened using URL shortening services such as TinyURL and bit.ly). We are also ingesting data from LiveStats, the BBC's own real-time indicator of traffic. Once ingested, this data is processed according to a specially created algorithm to calculate the 'buzz measure' for every BBC programme – more detail on the algorithm can be found on Shownar's Technical information page.

The post discusses some of the interfaces and benefits – I think the possibilities are pretty endless, and will be exploring how it might enhance the discoverability of and harness conversations about the Science Museum's online collections over the year.

Hat tip: @giv_p

'The strikethrough is the canonical symbol of the Web'

Below is a quote from Wired's Chris Anderson on museum, curatorial authority and the long tail, from a Washington Post report, 'Smithsonian Click-n-Drags Itself Forward' on Smithsonian 2.0 ('A Gathering to Re-Imagine the Smithsonian in the Digital Age').

The quote really covers two issues – making failures and mistakes in public and leaving them there, and training external volunteers and experts to curate parts of collections, because no one curator can be authoritative on everything in their remit: "in exchange for a slight diminution of the credentialed voice for a small number of things, you would get far more for a lot of things".

I suspect this is a false dichotomy – there's a place for both internal and external expertise. The Science Museum object wiki doesn't mean the rest of the collection catalogue and interpretation has no value or relevance. The challenge lies in presenting organisation and user-contributed content in the same interface – can those boundaries be removed? Is it wise to try? And what about taking external content back into the catalogue?

This isn't a new conversation for museum technologists, but it's a conversation I'd love to have with curators. I've never been sure how the technologists who get really excited by the possibilities of sharing content online in various ways can go about working with curators to find the best way of managing it so that the public, the collections and the curators benefit.

Anyway, onto Chris Anderson:

The discovery of the "long tail" principle has implications for museums because it means there is vast room at the bottom for everything. Which means, Anderson said, that curators need to get over themselves. Their influence will never be the same.

"The Web is messy, and in that messiness comes something new and interesting and really rich," he said. "The strikethrough is the canonical symbol of the Web. It says, 'We blew it, but we are leaving that mistake out there. We're not perfect, but we get better over time.' "

If you think that notion gives indigestion to an organization like the Smithsonian — full of people who have devoted much of their lifetimes to bringing near-perfect luster to some tiny pearl of truth — you would be correct.

The problem is, "the best curators of any given artifact do not work here, and you do not know them," Anderson told the Smithsonian thought leaders. "Not only that, but you can't find them. They can find you, but you can't find them. The only way to find them is to put stuff out there and let them reveal themselves as being an expert."

Take something like, oh, everything the Smithsonian's got on 1950s Cold War aircraft. Put it out there, Anderson suggested, and say, "If you know something about this, tell us." Focus on the those who sound like they have phenomenal expertise, and invest your time and effort into training these volunteers how to curate. "I'll bet that they would be thrilled, and that they would pay their own money to be given the privilege of seeing this stuff up close. It would be their responsibility to do a good job" in authenticating it and explaining it. "It would be the best free labor that you can imagine."

It didn't go down easily among the thought leaders, who have staked their lives' work on authoritativeness, on avoiding strikethroughs. What about the quality and strength of the knowledge we offer? asked one Smithsonian attendee.

You don't get it, Anderson suggested. "There aren't enough of you. Your skills cannot be invested in enough areas to give that quality."

It's like Wikipedia and the Encyclopedia Britannica, Anderson said. Some Wikipedia entries certainly are not as perfectly polished as the Britannica. But "most of the things I'm interested in are not in the Britannica. In exchange for a slight diminution of the credentialed voice for a small number of things, you would get far more for a lot of things. Something is better than nothing." And right now at the Smithsonian, what you get, he said, is "great" or "nothing."

"Is it our job to be smart and be the best? Or is it our job to share knowledge?" Anderson asked.

Social Media Statistics

One of those totally brilliant and obvious-in-hindsight ideas. I'd like to see stronger guidelines on citing sources as it grows and clear differentiation by region/nation, because it's easy for vague figures and rumour to become universal 'fact', but it's a great idea and will hopefully grow: Social Media Statistics is:

A big home for all facts and figures around social media – because I'm fed up of trawling around for them and I'm also sure that I'm not the only one who gets asked 'how many users does Facebook have?' every hour of every day. … I'm hoping that this wiki will not only include usage stats, but also behaviour and attitude stats. It's a bit of a skeleton at the moment, with v few of my stats having stated sources, but be patient – and help where you can!

Please add in any juicy stats as you come across them, and do cite your references and link to them where possible.

I'll put my money where my mouth is and add information I find. I find wikis a really useful tool for lightweight documentation – it's really easy to add some information while it's in your brain, and the software doesn't get in the way of your flow.

For a while now I've wanted a repository of museum and cultural heritage audience evaluation – this could be a good model. Speaking of which, I really must write up my notes from the MCG Autumn meeting.

[Edit to add: Social Media Statistics also links to Measurementcamp, which might be of interest to cultural heritage organisations wondering how they can 'measure their social media communications online and offline' (and how they can work with project sponsors and funders to define suitable metrics for an APId, social media world).]

A sort of private joy? User-generated content and museums

I came across this lovely perspective on the content visitors create with museums:

We didn't start out asking people to leave their work, but it always happened. Now, we build it into the consideration of the activities that will be offered in the space. It isn't really like the formal artist-displaying-work model that is in evidence throughout the museum…the work is typically anonymous and individual pieces aren't highlighted.

When you walk into the space during the last month or so of an exhibition you experience the visitor-created artwork as a single, room-sized installation first, and only later do you focus on individual pieces. I think it is closer in some ways to the urge behind street art…the sort of private joy to be had from making something great and then leaving it behind for others to discover. I sometimes see visitors coming back to find something that they left behind a month or two before, not to reclaim it, just to see where it is now.

From the post 'Show your work' at the Indianapolis Museum of Art Blog.

It's talking about work created during physical visits to a museum rather than virtual visits but I think I noticed it because there's been a spurt of discussion about user-generated content and visitor participation on museum websites on the MCG mailing list following a workshop at the London Museums Hub on 'Understanding collections use and online access across the London Hub'.

A study presented at the event found an apparent lack of interest from museum website visitors in user-generated content, but in discussion at the event it appeared that the findings might have been different if the questions had been asked differently (with examples of some possible outcomes from UGC, perhaps 'would you like museum collections to be more discoverable because other visitors had tagged them with everyday words you'd use' rather than 'would you like opportunities to comment or upload content'), or if the focus groups hadn't been recruited from people who were physically visiting a museum and who were therefore fairly traditional museum-goers.

This lead to some interesting discussion about the differing reactions to the idea of opinions from other visitors versus real-life stories from other visitors; and of the idea that sometimes the value in user-created content lives with the person who contributes rather than those who read their contributions lately. This last idea was also raised at the User-generated content session at Museums and the Web in Montreal, my notes are here. I think the role of authority and trust and the influence of the context (type of museum or collection, user goal) need to be teased out into a more sophisticated model for analysing user-generated content in the cultural heritage sector.

There's a lot of research into user-generated content, participation and social software going on in the UK at the moment, it'd be great if there was somewhere that results, and ideally the raw data too, could be shared. Perhaps the MCG site?