Frequently Asked Questions about crowdsourcing in cultural heritage

Over time I've noticed the repetition of various misconceptions and apprehensions about crowdsourcing for cultural heritage and digital history, so since this is a large part of my PhD topic I thought I'd collect various resources together as I work to answer some FAQs. I'll update this post over time in response to changes in the field, my research and comments from readers. While this is partly based on some writing for my PhD, I've tried not to be too academic and where possible I've gone for publicly accessible sources like blog posts rather than send you to a journal paywall.

If you'd rather watch a video than read, check out the Crowdsourcing Consortium for Libraries and Archives (CCLA)'s 'Crowdsourcing 101: Fundamentals and Case Studies' online seminar.

[Last updated: February 2016, to address 'crowdsourcing steals jobs'. Previous updates added a link to CCLA events, crowdsourcing projects to explore and a post on machine learning+crowdsourcing.]

What is crowdsourcing?

Definitions are tricky. Even Jeff Howe, the author of 'Crowdsourcing' has two definitions:

The White Paper Version: Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.

The Soundbyte Version: The application of Open Source principles to fields outside of software.

For many reasons, the term 'crowdsourcing' isn't appropriate for many cultural heritage projects but the term is such neat shorthand that it'll stick until something better comes along. Trevor Owens (@tjowens) has neatly problematised this in The Crowd and The Library:

'Many of the projects that end up falling under the heading of crowdsourcing in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor. … They are about inviting participation from interested and engaged members of the public [and] continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods'

Defining crowdsourcing in cultural heritage

To summarise my own thinking and the related literature, I'd define crowdsourcing in cultural heritage as an emerging form of engagement with cultural heritage that contributes towards a shared, significant goal or research area by asking the public to undertake tasks that cannot be done automatically, in an environment where the tasks, goals (or both) provide inherent rewards for participation.

Screenshot from 'Letters of 1916' project.

Who is 'the crowd'?

Good question!  One tension underlying the 'openness' of the call to participate in cultural heritage is the fact that there's often a difference between the theoretical reach of a project (i.e. everybody) and the practical reach, the subset of 'everybody' with access to the materials needed (like a computer and an internet connection), the skills, experience and time…  While 'the crowd' may carry connotations of 'the mob', in 'Digital Curiosities: Resource Creation Via Amateur Digitisation', Melissa Terras (@melissaterras) points out that many 'amateur' content creators are 'extremely self motivated, enthusiastic, and dedicated' and test the boundaries between 'between definitions of amateur and professional, work and hobby, independent and institutional' and quotes Leadbeater and Miller's 'The Pro-Am Revolution' on people who pursue an activity 'as an amateur, mainly for the love of it, but sets a professional standard'.

There's more and more talk of 'community-sourcing' in cultural heritage, and it's a useful distinction but it also masks the fact that nearly all crowdsourcing projects in cultural heritage involve a community rather than a crowd, whether they're the traditional 'enthusiasts' or 'volunteers', citizen historians, engaged audiences, whatever.  That said, Amy Sample Ward has a diagram that's quite useful for planning how to work with different groups. It puts the 'crowd' (people you don't know), 'network' (the community of your community) and 'community' (people with a relationship to your organisation) in different rings based on their closeness to you.

'The crowd' is differentiated not just by their relationship to your organisation, or by their skills and abilities, but their motivation for participating is also important – some people participate in crowdsourcing projects for altruistic reasons, others because doing so furthers their own goals.

I'm worried about about crowdsourcing because…

…isn't letting the public in like that just asking for trouble?

@lottebelice said she'd heard people worry that 'people are highly likely to troll and put in bad data/content/etc on purpose' – but this rarely happens. People worried about this with user-generated content, too, and while kids in galleries delight in leaving rude messages about each other, it's rare online.

It's much more likely that people will mistakenly add bad data, but a good crowdsourcing project should build any necessary data validation into the project. Besides, there are generally much more interesting places to troll than a cultural heritage site.

And as Matt Popke pointed out in a comment, 'When you have thousands of people contributing to an entry you have that many more pairs of eyes watching it. It's like having several hundred editors and fact-checkers. Not all of them are experts, but not all of them have to be. The crowd is effectively self-policing because when someone trolls an entry, somebody else is sure to notice it, and they're just as likely to fix it or report the issue'.  If you're really worried about this, an earlier post on Designing for participatory projects: emergent best practice' has some other tips.

 …doesn't crowdsourcing take advantage of people?

XKCD on the ethics of commercial crowdsourcing

Sadly, yes, some of the activities that are labelled 'crowdsourcing' do. Design competitions that expect lots of people to produce full designs and pay a pittance (if anything) to the winner are rightly hated. (See antispec.com for more and a good list of links).

But in cultural heritage, no. Museums, galleries, libraries, archives and academic projects are in the fortunate position of having interesting work that involves an element of social good, and they also have hugely varied work, from microtasks to co-curated research projects. Crowdsourcing is part of a long tradition of volunteering and altruistic participation, and to quote Owens again, 'Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage.'

[Update, May 2013: it turns out museums aren't immune from the dangers of design competitions and spec work: I've written On the trickiness of crowdsourcing competitions to draw some lessons from the Sydney Design competition kerfuffle.]

Anyway, crowdsourcing won't usually work if it's not done right. From A Crowd Without Community – Be Wary of the Mob:

"when you treat a crowd as disposable and anonymous, you prevent them from achieving their maximum ability. Disposable crowds create disposable output. Simply put: crowds need a sense of identity and community to achieve their potential."

…crowdsourcing can't be used for academic work

Reasons given include 'humanists don't like to share their knowledge' with just anyone. And it's possible that they don't, but as projects like Transcribe Bentham and Trove show, academics and other researchers will share the work that helps produce that knowledge. (This is also something I'm examining in my PhD. I'll post some early findings after the Digital Humanities 2012 conference in July).

Looking beyond transcription and other forms of digitisation, it's worth checking out Prism, 'a digital tool for generating crowd-sourced interpretations of texts'.

…it steals jobs

Once upon a time, people starting a career in academia or cultural heritage could get jobs as digitisation assistants, or they could work on a scholarly edition. Sadly, that's not the case now, but that's probably more to do with year upon year of funding cuts. Blame the bankers, not the crowdsourcers.

The good news? Crowdsourcing projects can create jobs – participatory projects need someone to act as community liaison, to write the updates that demonstrate the impact of crowdsourced contributions, to explain the research value of the project, to help people integrate it into teaching, to organise challenges and editathons and more.

What isn't crowdsourcing?

…'the wisdom of the crowds'?

Which is not just another way of saying 'crowd psychology', either (another common furphy). As Wikipedia puts it, 'the wisdom of the crowds' is based on 'diverse collections of independently-deciding individuals'. Handily, Trevor Owens has just written a post addressing the topic: Human Computation and Wisdom of Crowds in Cultural Heritage.

…user-generated content

So what's the difference between crowdsourcing and user-generated content? The lines are blurry, but crowdsourcing is inherently productive – the point is to get a job done, whether that's identifying people or things, creating content or digitising material.

Conversely, the value of user-generated content lies in the act of creating it rather than in the content itself – for example, museums might value the engagement in a visitor thinking about a subject or object and forming a response to it in order to comment on it. Once posted it might be displayed as a comment or counted as a statistic somewhere but usually that's as far as it goes.

And @sherah1918 pointed out, there's a difference between asking for assistance with tasks and asking for feedback or comments: 'A comment book or a blog w/comments isn't crowdsourcing to me … nor is asking ppl to share a story on a web form. That is a diff appr to collecting & saving personal histories, oral histories'.

…other things that aren't crowdsourcing:

[Heading inspired by Sheila Brennan @sherah1918]

  • Crowdfunding (it's often just asking for micro-donations, though it seems that successful crowdfunding projects have a significant public engagement component, which brings them closer to the concerns of cultural heritage organisations. It's also not that new. See Seventeenth-century crowd funding for one example.)
  • Data-mining social media and other content (though I've heard this called 'passive' or 'implict' crowdsourcing)
  • Human computation (though it might be combined with crowdsourcing)
  • Collective intelligence (though it might also be combined with crowdsourcing)
  • General calls for content, help or participation (see 'user-generated content') or vaguely asking people what they think about an idea. Asking for feedback is not crowdsourcing. Asking for help with your homework isn't crowdsourcing, as it only benefits you.
  • Buzzwords applied to marketing online. And as @emmclean said, "I think many (esp mkting) see "crowdsourcing" as they do "viral" – just happens if you throw money at it. NO!!! Must be great idea" – it must make sense as a crowdsourced task.

Ok, so what's different about crowdsourcing in cultural heritage?

For a start, the process is as valuable as the result. Owens has a great post on this, Crowdsourcing Cultural Heritage: The Objectives Are Upside Down, where he says:

'The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches… Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. … At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory … it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them'.

And as I've said elsewhere, ' playing [crowdsourcing] games with museum objects can create deeper engagement with collections while providing fun experiences for a range of audiences'. (For definitions of 'engagement' see The Culture and Sport Evidence (CASE) programme. (2011). Evidence of what works: evaluated projects to drive up engagement (PDF).)

What about cultural heritage and citizen science?

[This was written in 2012. I've kept it for historical reasons but think differently now.]

First, another definition. As Fiona Romeo writes, 'Citizen science projects use the time, abilities and energies of a distributed community of amateurs to analyse scientific data. In doing so, such projects further both science itself and the public understanding of science'. As Romeo points out in a different post, 'All citizen science projects start with well-defined tasks that answer a real research question', while citizen history projects rarely if ever seem to be based around specific research questions but are aimed more generally at providing data for exploration. Process vs product?

I'm still thinking through the differences between citizen science and citizen history, particularly where they meet in historical projects like Old Weather. Both citizen science and citizen history achieve some sort of engagement with the mindset and work of the equivalent professional occupations, but are the traditional differences between scientific and humanistic enquiry apparent in crowdsourcing projects? Are tools developed for citizen science suitable for citizen history? Does it make a difference that it's easier to take a new interest in history further without a big investment in learning and access to equipment?

I have a feeling that 'citizen science' projects are often more focused on the production of data as accurately and efficiently as possible, and 'citizen history' projects end up being as much about engaging people with the content as it is about content production. But I'm very open to challenges on this…

What kind of cultural heritage stuff can be crowdsourced?

I wrote this list of 'Activity types and data generated' over a year ago for my Masters dissertation on crowdsourcing games for museums and a subsequent paper for Museums and the Web 2011, Playing with Difficult Objects – Game Designs to Improve Museum Collections (which also lists validation types and requirements).  This version should be read in the light of discussion about the difference between crowdsourcing and user-generated content and in the context of things people can do with museums and with games, but it'll do for now:

Activity Data generated
Tagging (e.g. steve.museum, Brooklyn Museum Tag! You're It; variations include two-player 'tag agreement' games like Waisda?, extensions such as guessing games e.g. GWAP ESP Game, Verbosity, Tiltfactor Guess What?; structured tagging/categorisation e.g. GWAP Verbosity, Tiltfactor Cattegory) Tags; folksonomies; multilingual term equivalents; structured tags (e.g. 'looks like', 'is used for', 'is a type of').
Debunking (e.g. flagging content for review and/or researching and providing corrections). Flagged dubious content; corrected data.
Recording a personal story Oral histories; contextualising detail; eyewitness accounts.
Linking (e.g. linking objects with other objects, objects to subject authorities, objects to related media or websites; e.g. MMG Donald). Relationship data; contextualising detail; information on history, workings and use of objects; illustrative examples.
Stating preferences (e.g. choosing between two objects e.g. GWAP Matchin; voting on or 'liking' content). Preference data; subsets of 'highlight' objects; 'interestingness' values for content or objects for different audiences. May also provide information on reason for choice.
Categorising (e.g. applying structured labels to a group of objects, collecting sets of objects or guessing the label for or relationship between presented set of objects). Relationship data; preference data; insight into audience mental models; group labels.
Creative responses (e.g. write an interesting fake history for a known object or purpose of a mystery object.) Relevance; interestingness; ability to act as social object; insight into common misconceptions.

You can also divide crowdsourcing projects into 'macro' and 'micro' tasks – giving people a goal and letting them solve it as they prefer, vs small, well-defined pieces of work, as in the 'Umbrella of Crowdsourcing' at The Daily Crowdsource and there's a fair bit of academic literature on other ways of categorising and describing crowdsourcing.

Using crowdsourcing to manage crowdsourcing

There's also a growing body of literature on ecosystems of crowdsourcing activities, where different tasks and platforms target different stages of the process.  A great example is Brooklyn Museum’s ‘Freeze Tag!’, a game that cleans up data added in their tagging game. An ecosystem of linked activities (or games) can maximise the benefits of a diverse audience by providing a range of activities designed for different types of participant skills, knowledge, experience and motivations; and can encompass different levels of participation from liking, to tagging, finding facts and links.

A participatory ecosystem can also resolve some of the difficulties around validating specialist tags or long-form, more subjective content by circulating content between activities for validation and ranking for correctness, 'interestingness' (etc) by other players (see for example the 'Contributed data lifecycle' diagram on my MW2011 paper or the 'Digital Content Life Cycle' for crowdsourcing in Oomen and Aroyo's paper below). As Nina Simon said in The Participatory Museum, 'By making it easy to create content but impossible to sort or prioritize it, many cultural institutions end up with what they fear most: a jumbled mass of low-quality content'.  Crowdsourcing the improvement of cultural heritage data would also make possible non-crowdsourcing engagement projects that need better content to be viable.

See also Raddick, MJ, and Georgia Bracey. 2009. “Citizen Science: Status and Research Directions for the Coming Decade” on bridging between old and new citizen science projects to aid volunteer retention, and Nov, Oded, Ofer Arazy, and David Anderson. 2011. “Dusting for Science: Motivation and Participation of Digital Citizen Science Volunteers” on creating 'dynamic contribution environments that allow volunteers to start contributing at lower-level granularity tasks, and gradually progress to more demanding tasks and responsibilities'.

What does the future of crowdsourcing hold?

Platforms aimed at bootstrapping projects – that is, getting new projects up and running as quickly and as painlessly as possible – seem to be the next big thing. Designing tasks and interfaces suitable for mobile and tablets will allow even more of us to help out while killing time. There's also a lot of work on the integration of machine learning and human computation; my post 'Helping us fly? Machine learning and crowdsourcing' has more on this.

Find out how crowdsourcing in cultural heritage works by exploring projects

Spend a few minutes with some of the projects listed in Looking for (crowdsourcing) love in all the right places to really understand how and why people participate in cultural heritage crowdsourcing.

Where can I find out more? (AKA, a reading list in disguise)

There's a lot of academic literature on all kinds of aspects of crowdsourcing, but I've gone for sources that are accessible both intellectually and in terms of licensing. If a key reference isn't there, it might be because I can't find a pre-print or whatever outside a paywall – let me know if you know of one!

9781472410221Liked this post? Buy the book! 'Crowdsourcing Our Cultural Heritage' is available through Ashgate or your favourite bookseller…

Thanks, and over to you!

Thanks to everyone who responded to my call for their favourite 'misconceptions and apprehensions about crowdsourcing (esp in history and cultural heritage)', and to those who inspired this post in the first place by asking questions in various places about the negative side of crowdsourcing.  I'll update the post as I hear of more, so let me know your favourites.  I'll also keep adding links and resources as I hear of them.

You might also be interested in: Notes from 'Crowdsourcing in the Arts and Humanities' and various crowdsourcing classes and workshops I've run over the past few years.

'I see, I feel, hence I notice, I observe, and I think'

More and more open and/or linkable cultural heritage data is becoming available, which means the next big challenge for memory institutions is dealing with 'death by aggregation: creating meaningful, engaging experiences of individual topics or objects within masses of digital data.  With that in mind, I've been wondering about the application of Roland Barthes' concepts of studium and punctum to large online collections.  (I'm in the middle of research interviews for my PhD, and it's amazing what one will think about in order to put off transcribing hours of recordings, but bear with me…)

Studium, in Wikipedia's definition, is the 'cultural, linguistic, and political interpretation of a photograph'.  While Barthes was writing about photography, I suspect studium describes the average, expected audience response to well-described images or objects in most collections sites – a reaction that exists within the bounds of education, liking and politeness.  However, punctum – in Barthes' words, the 'element which rises from the scene, shoots out of it like an arrow, and pierces me' – describes the moment an accidentally poignant or meaningful detail in an image captures the viewer.  Punctum is often personal to the viewer, but when it occurs it brings with it 'a power of expansion': 'I see, I feel, hence I notice, I observe, and I think'.  You cannot design punctum, but can we design collections interfaces to create the serendipitous experiences that enable punctum?  Is it even possible with images of objects, or is it more likely to occur with photographic collections?

While thinking about this, I came across an excellent post on Understanding Compelling Collections by John Coburn (@j0hncoburn) in which he describes some pilots on 'compelling historic photography' by Tyne & Wear Archives & Museums. The experiment asked two questions: 'Which of our collections best lends themselves to impulse sharing online?' and 'Which of our collections are people most willing to talk about online?'.  It's well worth reading both for their methods and their results, which are firmly grounded in the audiences' experience of their images: a 'key finding from our trial with Flickr Commons was that the mass sharing of images often only became possible when a user defined or redefined the context of the photograph', 'there’s a very real appetite on Facebook for old photography that strongly connects to a person’s past'.

Coming back to Barthes, their quest for images that 'immediately resonated with our audience on an emotional level and without context' is almost an investigation of enabling punctum; their answer: 'anything that How To Be a Retronaut would share', is probably good enough for most of us for now.  To summarise, they're 'era-specific, event-specific, moment-specific' images that 'disrupt people’s model of time', that 'tap into magic and the sublime', and that 'stir your imagination, not demand prior knowledge or interest'.  They're small, tightly-curated, niche-interest sets of images with evocative titles.

That's not how we generally think about or present online collections.  But what if we did?

[Update, May 16, 2012.

This post, from Flickr members co-curating an exhibition with the National Maritime Museum, offers another view – is the public searching for punctum when they view photographic collections, and does the museum/archive way of thinking about collections iron out the quirks that might lead to punctum?

'It is frightening to imagine what treasures will never see the light of day from the collection at the Brass Foundry. I got the sense that the Curators and the National Maritime Museum in general see these images as closely guarded historical documents and as such offer insight location, historical events and people in the image. There seems to be a lack of artistic appreciation for the variety of unusual and standalone images in the collection, raising an important question concerning the value attributed to each photograph when interpreted by an audience with different aesthetic interests. … In my opinion it is the ‘unknown’ quality of photography that initially inspires engagement and subsequently this process encourages an exploration of our own identity and how we as individuals create meaning.'  Source: 'The Brass Foundary Visit 19/04/2012']

My Europeana Tech keynote: Open for engagement: GLAM audiences and digital participation

This is a slightly abridged version of my notes for my keynote, 'Open for engagement: GLAM audiences and digital participation' at EuropeanaTech (#etech11) in Vienna in October 2011.

Introduction 
I'm really excited about being here to talk about some of my favourite things with you. I think helping people appreciate cultural heritage is one of the best jobs in the world so I feel lucky to be here with people working toward the same goal.

This is a chance to remind ourselves why we should get audiences participating digitally – how does it benefit both GLAM (galleries, libraries, archives, museums) and their audiences? I'm going to take you through some examples of digital participation and explain why I think they're useful case studies. I'll finish by summarising what we can learn from those case studies, looking for tips you can take back to your organisations. Hopefully we'll have time for a few questions or some discussion.

Why enable participation?
Isn't it easier to just keep doing what we're already doing? Maybe not – here are some problems your GLAM organisation might be facing…

You need to think digitally to enable participation at scale – to reach not tens or hundreds, but thousands or hundreds of thousands of people. As cultural heritage organisations, we have lots of experience with access and participation at reference desks and in galleries. We are good at creating experiences to engage, delight, and educate in person, but these are limited by the number of staff required, the materiality of the objects or documents, the size of a venue, its location and opening hours. We're still learning how to translate those brilliant participative experiences into the digital domain…

Collections are big, resources are small. In most cases we're still digitising catalogue records, let alone taking images and writing beautiful contextualised interpretative material for our collections. We'll be at it for centuries if we try to do it alone…

What's more, it's not enough for content to be online – it has to be findable. Our digitised content is still not very discoverable in search engines – which means it's effectively invisible to most potential audiences. We need better content to help search engines find the stuff we've put so much work into putting online. For example, I wanted to use Europeana images to illustrate my slides, but I had trouble finding images to match my ideas – but if other people had tagged them with words like 'happiness', 'excitement', 'crowds', I might have been able to find what I needed.

User-contributed content can help bridge the 'semantic gap' between the language used in catalogues and the language that most people would use to look for content.

Even when our content is found by our audiences, it's not always very accessible without information about the significance, and cultural and historical context of the item. Further, in Europeana's case, there's a gap between the many languages of the user community and the catalogue metadata; as well as gaps between historical and contemporary language. Sadly, at the moment, many records lack enough context for a non-expert to have a meaningful experience with them.

Why support participation?
So, those are some of the problems we're looking for solve… what are the benefits of digital participation?
Firstly, the benefits to organisations

Engagement and participation is often part of your core mission.

inspire, passion, educate, enhance, promote preserve, record, access, learn, discover, use, memory, culture, conservation, innovation

I had a look at some mission statements from various museums, libraries, and archives, and these are the words that frequently occurred. The benefits of audience participation are both tangible and intangible, and exactly how they relate to your mission (and can be measured in relation to it) depends on the organisation. And don't forget that access may not be enough if your content isn't also discoverable and engaging.

Participation can increase traffic. It's pretty simple – if content is more discoverable, more people will discover it. If audiences can actively participate, they'll engage with your collections for longer, and return more often. They may even turn into physical visitors or buy something online…

Turn audiences into advocates – there are many people who forget that GLAMs even exist once they've left school – but these are often the people we can reach with digital projects. When people directly benefit from your resources, they know why your organisation is important. You're no longer dusty old stuff in boxes, you're their history, part of the story of how their lives came to be and how their future is formed.  When people have a great experience with you, they become fans. When you encourage people to participate in meaningful work, they gain a sense of ownership and pride. These intangible outcomes can be as important as the content created through audience participation.  It's a chance to let people see the full complexity of what you do, how much work goes into providing access and interpretation; understand that what they see on the shelves or in the galleries is the tip of the iceberg..

There are more experts outside your GLAM than within. Participatory projects let you access external knowledge.  This knowledge can include the experience of using, repairing or building an object; memories of the events or places you've recorded; or it may be specialist knowledge they've built through their own research. Let them share their knowledge with you, and through you, with your audiences.

Finally, the rest of the world is moving from broadcast to dialogue and interaction. If you spend time around kids, you may have seen them interact with old-fashioned screens – for them, an interface you can only look at is broken.

Benefits to audience

It's all very well saying participation creates deeper engagement, but rather than tell you again, I'd rather show you with a quick thought experiment.

First I want you to imagine taking a photo of an object in a museum. Ok – so, how many times do you really go back and look at that photo? How much do you remember about that object? Do you find yourself thinking about it later? Do you ever have a conversation with friends about it?

Now I want you to imagine sketching the object, perhaps at this handy sketching station in the Musée des Beaux-Arts de Dijon.

As you draw, you'll find yourself engaging with the particular materiality of the object – the details of its construction, the way time has affected it. You may start wondering about the intention of the creators, what it was like to use it or encounter it in everyday life. In having an active relationship with that object, you've engaged more deeply, perhaps even changed a little as a result. New questions have been raised that you may find yourself pondering, and may even decide to find out more, and start your own research, or share your feelings with others.

Perhaps surprisingly, even the act of tagging an object has a similar effect, because you have to pay it some attention to say something about it…

A big benefit for audiences is that participation is rewarding. There are many reasons why, but these are some I think are relevant to participation. Games researcher Jane McGonigal (Gaming the future of museumssays people crave:

1. satisfying work to do
2. the experience of being good at something
3. time spent with people we like
4. the chance to be a part of something bigger

Participation in digital cultural heritage projects can meet all those needs.

Types of participation

The Center for Advancement of Informal Science Education came up with these forms of public participation in science research. Nina Simon of the Museum 2.0 blog mapped them to museums and added 'co-option'; I've included 'platform'.

  • Contributory – Most GLAM user-generated content projects. Designed by the organisation, the public contributes data.
  • Collaborative – the public may be active partners in some decisions, but the project is lead by the organisation
  • Co-creative – all partners define goals and make decisions together
  • Platform – organisation as venue or host for other activity.

It's also important to remember that there are some types of participation where the value lies mostly in the effect of the act of creation for the individual – for example, most commenting doesn't add much to my experience of the thing commented on. However, sometimes there's also value more widely – for example, when someone comments and includes a new fact or interesting personal story. Taking this further, participatory projects can be designed so that each contribution helps meet a defined goal. Crowdsourcing involves designing carefully scaffolded tasks so that the general public can contribute to a shared goal. Crowdsourcing in cultural heritage is probably most often contributory rather than collaborative or co-creative.

Case studies
I've chosen two established examples and two experimental ones to demonstrate how established digital participation is, and also where it's going…

Flickr Commons – I'm sure you've all probably heard of this, but it's a great reminder of how effective simply sharing content in places where people hang out can be. The first tip: go fishing where the fish are biting. Find the digital spaces where people are already engaging with similar content.

Example page: [Sylvia Sweets Tea Room, corner of School and Main streets, Brockton, Mass.].  You can see from the number of views, comments, tags, favourites and notes that organisations are still finding much higher levels of discoverability, traffic and user contributions on the Commons than they'd ever get on their own, individual sites. It's also a nice example of the public identifying a location, and there are wonderful personal recollections and family histories in the comments below.

Trove – crowdsourcing OCR correction.  Tasks like OCR correction that require judgement or complicated visual processing are perfect for crowdsourcing.

Crowdsourcing can solve real problems – helping scientists identify galaxies and proteins that could save lives, or providing data about climate change through history. In this example, crowdsourcing is helping correct optical character recognition (OCR) errors. In the example here, the correction is subtle, but as someone from the location described, I can tell you that the transcription now makes a lot more sense… And making that correction felt good.

According to the National Library of Australia, by February 2011 they had '20,000+ people helping out and 30 million lines of text had been corrected during the last 2 years'. This is a well-designed interface. Their clear 'call to action' – 'fix this text' – is simple and located right where it needs to be.  Another tip: you don't need to register, but you can if you want to track your progress. Registration isn't a barrier, and it's presented as a benefit to the audience, not the organisation. They've also got a forum as a platform for conversation between participants.

So, crowdsourcing is great. But as crowdsourcing gets more popular, you will be competing for 'participation bandwidth' with other participatory and crowdsourcing projects – people will be deciding whether to work with your site or something else that meets their needs… What to do?

Well, it turns out that crowdsourcing games can act as 'participation engines'…

[I then talked about 'a small tagging game I researched, designed and made in my evenings and weekends, so that you can see the potential for crowdsourcing games even for GLAMs that don't have a lot of resources' – if you're curious, it's probably easiest to check out the slides at http://www.slideshare.net/miaridge/everyone-wins-crowdsourcing-games-and-museums alongside the video at http://vimeo.com/26858316].

Because crowdsourcing games can be more accessible to the general public, they can also increase the number of overall contributors, as well as encouraging each contributor to stay for longer, do more work, engage more deeply. Crowdsourcing games can be much more productive than a non-game interface by encouraging people to spend more time and play with more content. If games not suitable for your audience, you can adopt some of the characteristics of games – clear initial tasks to start with and a sense of the rules of the game, good feedback on the results of player actions towards a goal, mastering new skills and providing interesting problems to solve…

Continuing the [Europeana Tech] theme of openness, this project was only possible because the Science Museum (UK) and the Powerhouse Museum had APIs into their object records – I was able to create a game that united their astronomy objects without ever having to negotiate a partnership or licensing agreement.

Oramics – co-creation (and GLAM as platform).  My final example is something I worked on just before I left the Science Museum but I make the caveat that I can't claim any credit for all the work done since, and I haven't seen any internal evaluation on the project.

The Oramics project was a conscious experiment in co-curation and public history, part of a wider programme of research. This is the Oramics machine. It's a difficult object to interpret – it's a hand-built synthesiser, and not much to look at – it's all about how it sounded, but it's too fragile to restore to working order. So the museum needed help interpreting the object, in understanding how to explain its significance and market it to new audiences. They tried a few different things in this project… They worked with young people from the National Youth Theatre who met museum staff to learn about the people who invented and built the machine, and they visited the object store to see the machine. They worked with developers to make an app to recreate the sounds of the synthesiser so that people could make new music with it. They also worked with a group of co-curators recruited online to help make it interesting to general visitors as well as music fans – the original call to action was something like 'we have an amazing object we need to bring to life, and six empty cases – help us fill them!'.

While the main outputs of all this activity are pretty traditional – a performance event, an exhibition – it's also been the catalyst for the creation of an ad hoc online community and conversations on Facebook and blogs.

As Clay Shirky told the Smithsonian 2.0 workshop in 2009, it's possible that "the artefact itself has created the surface to which the people adhere. … Every artefact is a latent community". It's nice to think we're finally getting to that point.

Best practice tips
So what do you need to think about to design a participatory project?

  • Have an answer to 'Why would someone spend precious time on your project?'
  • Be inspired by things people love
  • Design for the audience you want
  • Make participating pleasurable
  • Don't add unnecessary friction, barriers
  • Show how much you value contributions
  • Validate procrastination – offer the opportunity to make a difference, and show, don't tell, how it's making a difference
  • Make it easy to start participating, design scaffolded tasks to keep people going
  • Let audiences help manage problems
  • Test with users; iterate; polish
  • Empower audience to keep the place tidy – let them know what's acceptable and what's discouraged and how they can help.

Best practice within your GLAM
How can your organisation make the most of the opportunities digital participation provides?

  • Have a clear objective
  • Know how to measure success
  • Allow for community management resources
  • Realistically assess fears, decide acceptable risk
  • Put the audience's needs first. You need a balance between the task want to achieve, the skills and knowledge of audience and the content you have to work with.
  • Fish where the fish are – find the spaces where people are already engaging with similar content and see how you can slot in, don't expect people to find their way to you.
  • Decide where it's ok to lose control – let go… you may find audiences you didn't expect, or people may make use your content in ways you never imagined. Watch and learn – another reason to iterate and go into public beta earlier rather than later.
  • Open data – let people make new things with your content. Bad people will do it anyway, but by not having open data, you're preventing exactly the people you want to work with from doing anything with your data. Unclear or closed licenses are the biggest barrier that friendly hackers and developers raise with me when I ask about cultural heritage data…

In a 2008 post about museum-as-platform, Nina Simon says it's about moving from controlling everything to providing expertise; learning to change from content provider to platform. [More recently, Rob Stein posted about participatory culture and the subtle differences between authoritarian and authoritative approaches.]

Conclusion
Perhaps most important of all – enjoy experiencing your collections through new eyes!

Notes from EuropeanaTech 2011

Some very scrappy notes from the EuropeanaTech conference held in Vienna this week as I prepare a short talk for the Open data in cultural heritage (LODLAM-London) event tonight… For a different perspective there's an overview post at EuropeanaTech – är det här framtidens kulturarv? and I'll link to any others I find.  I've also put up some photos of ten questions attendees asked about Europeana, with written answers from the break-out exercise.  I'll tidy up and post my keynote notes in a few days, and I'll probably summarise things a bit more then.

Max Kaiser: Europeana is like a cruise ship with limited room to move, hackathons inject Europeana with a bit more agility… Build real stuff for real people with real business requirements – different to building prototypes and proofs of concept – requires different project culture.

Bill Thompson: pulling the analogue past into the digital future… We don't live in a digital world and never will – the physical world is not going to vanish. We'll remain embodied minds; will have co-existing analogue and digital worlds.Digital technologies shaping the possibilities we decide to embrace. … Can't have a paradigm shift in humanities because no basic set of beliefs to argue with… But maybe the shift to digital is so fundamental that it could be called a paradigm shift. … Even if you don't engage online, you'll still live in a world shaped by the digital.  Those who are online will come to define the norms. … Revolutionary vanguard in our midst – hope lies with the programmers, the coders – the only weapon that matters is running code. Have to build on technologies that are open, only way to build diverse online culture that allows all voices to be heard. … Means open data in a usable form – properly formulated so can be interpreted by anyone or any program that wants it; integrate them into the broader cultural space. Otherwise just disconnected islands.

Two good reasons to endorse open linked data. We're the first generation that's capable of doing this – have the tools, network, storage, processes. Within our power to digitise everything and make it findable. We may also be the only generation that wants to do it – later generations will not value things that aren't visible on the screen in the same way – they'll forget the importance of the non-digital. So we'd better get on with it, and do it properly. LOD is a foundation that allows us to build in the future.

Panel discussion…

Qu: how does open theme fit with orgs with budget cuts and need to make more money?
BT: when need to make money from assets, openness is a real challenge. There are ways of making assets available to people that are unlikely to have commercial impact but could raise awareness e.g. low-res for public access, high-res for commercial use [a model adopted by many UK museums].

Jill Cousins: there's a reputational need to put decent resolution images online to counter poor quality versions online.

Max: be clever – don't make an exclusive contract with digitisation partners – make sure you can also give free access to it.
Jill Cousins: User always been central to Europeana though got slightly lost along the way as busy getting data.  …  Big stumbling block – licenses. Not just commercial reasons, also about reputational risk, loss of future earnings, fear of giving away something that's valuable in future. Without CC licence, can't publish as linked open data. Without it, commercial providers like INA can't take the API. Can't use blogs that have advertising on them. Couldn't put it on Wikipedia. Or ArtFinder.  …  New [UK?] Renaissance report – metadata related to the digitised objects by cultural heritage orgs should be widely and freely available for re-use.
Workshops with content holders: Risks – loss of quality, loss of control, attribution, brand value, potential income ('phantom income'), unwanted spillover effects – misuse/juxtaposition of data. Rwards: increasing relevance, increasing channels to end users, data enrichment, brand value, specific funding opportunties, discoverability, new customers, public mission, building expertise, desired spillover effects. … You are reliant on user doing the right thing with attribution….
Main risks: unwanted spillover effects, loss of attribution, loss of potential income. Main rewards: new customers, increasing relevance, public mission. But the risks diminshed as the rewards gain more prominence – overall outweighed the risks.  But address those 3 areas of risk.
What next? Operationalise some of the applications developed.  Yellow Kitchen Maid paper on the business of open data. Working together on difficulties faced by institutions and licensing open data.
[notes from day 2 to follow!]
Ten questions about Europeana…
10 questions (and one general question)
The general question was, what can the community building with domain experts, developers and researchers/R&D/innovation work package in Europeana 2.0 do?  (Something like that anyway, it was all a bit confusing by that point)
You had to pick a question and go into a group to try and answer it – I've uploaded photos of the answer sheets.
1 Open source – if Europeana using open source software and is open software, should it also become a community-driven development project?
2 Open source – are doubts about whether OSS provides quality services justified? What should be done to ensure quality?
3 Aggregation and metadata quality – what will be the role of aggregators, and what is role of Europeana in LOD future?
4 What can Europeana do which search engines can't that justifies the extra effort of creating and managing structured metadata?
5 Is EDM [Europeana Data Model] still too complicated? If yes, what to simplify.
6 What is the actual value of semantic contexualisation, and could that not be produced by search engines?
7 enhance experience of exploring, discovering [see photo – it was too long to type in time!]
8 How important is multilingual access for discovery in Europeana? Which elements are the most important?
9 Can Europeana drive end-user engagement on the distributed sites and services of contributing archives?
10 How can we benefit from existing (local, international) communities in enriching the user experience on Europeana?

Slides and talk from 'Cosmic Collections' paper

This is a lazy post, a straight copy and paste of my presentation notes (my excuse is that I'm eight days behind on everything at work and uni after being grounded in the US by volcanic ash). Anyway, I hope you enjoy it or that it's useful in some way.

Cosmic Collections: creating a big bang?

View more presentations from Mia .

Slide 1 (solar rays – Cosmic Collections):

The Cosmic Collections project was based on a simple idea – what if we gave people the ability to make their own collection website? The Science Museum was planning an exhibition on astronomy and culture, to be called ‘Cosmos & Culture’. We had limited time and resources to produce a site to support the exhibition and we risked creating ‘just another exhibition microsite’. So what if we provided access to the machine-readable exhibition content that was already being gathered internally, and threw it open to the public to make websites with it?  And what if we motivated them to enter by offering competition prizes?  Competition participants could win a prize and kudos, and museum audiences might get a much more interesting, innovative site.
The idea was a good match for museum mission, exhibition content, technical context, hopefully audience – but was that enough?
Slide 2 (satellite dish):
Questions…
If we built an API, would anyone use it?
Can you really crowdsource the creation of collections interfaces?
The project gave me a chance to investigate some specific questions.  At the time, there were lots of calls from some quarters for museums to produce APIs for each project, but would anyone actually use a museum API?  The competition might help us understand whether or how we should invest in APIs and machine-readable data.
We can never build interfaces to meet the needs of every type of audience.  One of the promises of machine-readable data is that anyone can make something with your data, allowing people with particular needs to create something that supports their own requirements or combines their data with ours – but would anyone actually do it?
Slide 3 (map mashup):
Mashups combine data from one or more sources and/or data and visualisation tools such as maps or timelines.
I'm going to get the geek stuff out of the way and quickly define mashups and APIs…
Mashups are computer applications that take existing information from known sources and present it to the viewer in a new way. Here’s a mashup of content edits from Wikipedia with a map showing the location of the edit.
Slide 4 (APIs)
APIs (Application Programming Interfaces) are a way for one machine to talk to another: ‘Hi Bob, I’d like a list of objects from you, and hey, Alice, could you draw me a timeline to put the objects on?’
APIs tell a computer, 'if you go here, you will get that information, presented like this, and you can do that with it'.
A way of providing re-usable content to the public, other museums and other departments within our museum – we created a shared backend for web and gallery interactives.
I think of APIs as user interfaces for developers and wanted to design a good experience for developers with the same care you would for end users*.  I hoped that feedback from the competition could be used to improve the beta API
* we didn’t succeed in the first go but it’s something to aim for post-beta
Slide 5: (what if nobody came?)
AKA 'the fears and how to deal with them'
Acknowledge those fears
Plan for the worst case scenario
Take a deep breath and do it anyway
And on the next slides, the results.  If I was replicating the real experience, you’d have several nerve-biting months while you waited for the museum to lumber into gear, planned the launch event, publicised the project in the participant communities… Then waited for results to come in. But let’s skip that bit…
Slide 6: (Ryan Ludwig's http://www.serostar.com/cosmic/)
The results – our judges declared a winner and a runner-up, these are screenshots – this is the second prize winning entry.
People came to the party. Yay! I'd like to thank all the participants, whether they submitted a final entry or not. It wouldn't have worked without them.
Slide 7: (Natalie and Simon's http://cosmos.natimon.com/)
This is a screenshot from the winning site – it made the best use of the API and was designed to lure the visitor in and keep drawing them through the site.
(We didn’t get subject specialists scratching their own itch – maybe they don’t need to share their work, maybe we didn’t reach them. Would like to reach researchers, let them know we have resources to be used, also that they can help us/our audiences by sharing their work)
Slide 8: (astrolabe – what did we learn?)
People need (more) help to participate in a geektastic project like this
The dynamics of a competition are tricky
Mashups are shaped by the data provided – you get out what you put in
Can we help people bring their own content to a future mashup?
Slide 9: (evaluation)
I did a small survey to evaluate the project… Turns out the project was excellent outreach into the developer community. People were really excited about being invited to play with our data.  My favourite quote: "The very idea of the competition was awesome"
Slide 10: (paper sheet)
Also positive coverage in technical press. So in conclusion?
Slide 11: (Tim Berners-Lee):
“The thing people are amazed about with the web is that, when you put something online, you don’t know who is going to use it—but it does get used.”
There are a lot of opportunities and excitement around putting machine-readable data online…
Slide 12: Tim Berners-Lee 2:
But:  It doesn’t happen automatically; It’s not a magic bullet
But people won't find and use your APIs without some encouragement. You need to support your API users. People outside the museum bring new ideas but there's still a big role for people who really understand the data and audiences to help make it a quality experience…
Slide 13 (space):
What next?
Using the feedback to focus and improve collection-wide API
Adding other forms of machine-readable data
Connecting with data from your collections?
I've been thinking about how to improve APIs – offer subject authorities with links to collections, embed markup in the collections pages to help search engines understand our data…
I want more! The more of us with machine-readable data available for re-use, the better the cross-collections searches, the region or specialism-wide mashups… I'd love to be able to put together a mashup showing all the cultural heritage content about my suburb; all the Boucher self-portraits; all the inventions that helped make the Space Shuttle work…
Slide 14: (thank you)
If you're interested in possibilities of machine-readable data and access to your collections, join in the conversation on the museum API wiki or follow along on twitter or on blogs.  Join in at http://museum-api.pbworks.com/
More at https://openobjects.org.uk/ or @mia_out

Image credits include:
http://antwrp.gsfc.nasa.gov/apod/ap100415.html
http://antwrp.gsfc.nasa.gov/apod/ap100414.html
http://antwrp.gsfc.nasa.gov/apod/ap100409.html
http://antwrp.gsfc.nasa.gov/apod/ap100209.html
http://antwrp.gsfc.nasa.gov/apod/ap100315.html
http://www.sciencemuseum.org.uk/Centenary/Home/Icons/Pilot_ACE_Computer.aspx

Mash the state

'Cosmic Collections' – my MW2010 paper online

My Museums and the Web 2010 paper is up at Cosmic Collections: Creating a Big Bang and I'm working on the slides now and I'm curious – what would you like to see more of in a presentation?  It's only short (6 minutes) so I'm currently thinking setup (including lots of definitions for non-geeks), outcomes (did the project succeed?), and a bit on what I think the next steps are (basically a call to get your data online in re-usable formats).

I'm thinking of leading with this Tim Berners-Lee quote from an article in Prospect, Mash the state:

"The thing people are amazed about with the web is that, when you put something online, you don't know who is going to use it—but it does get used."

Why do museums prefer Flickr Commons to Wikimedia Commons?

A conversation has sprung up on twitter about why museums prefer Flickr Commons to Wikimedia Commons after Liam Wyatt, Vice President of Wikimedia Australia posted "Flickr Commons is FULL for 2010. GLAMs, Fancy sharing with #Wikimedia commons instead?" and I responded with "has anyone done audience research into why museums prefer Flickr to Wikimedia commons?".  I've asked before because I think it's one of those issues where the points of resistance can be immensely informative.

I was struck by the speed and thoughtfulness of responses from kajsahartig, pekingspring, NickPoole1, richardmccoy and janetedavis, which suggested that the question hit a nerve.

Some of the responses included:

Kasja: Photos from collections have ended up at wikipedia without permission, that never happened with Flickr, could be one reason [and] Or museums are more benevolent when it happens at Flickr, it's seen more as individuals' actions rather than an organisations'?

Nick: Flickr lets you choose CC non-commercial licenses, whereas Wikimedia Commons needs to permit potential commercial use?

Janet: Apart fr better & clear CC licence info, like Flickr Galleries that can be made by all! [and] What I implied but didn't say before: Flickr provides online space for dialogue about and with images.

Richard: Flickr is so much easier to view and search than WM. Commons, and of course easier to upload.

Twitter can be a bit of an echo chamber at times, so I wanted to ask you all the question in a more accessible place.   So, is it true that museums prefer Flickr Commons to Wikimedia Commons, and if so, why?

[Update: Liam's new blog post addresses some of the concerns raised – this responsiveness to the issues is cheering.  (You can get more background at Wikipedia:Advice for the cultural sector and Wikipedia:Conflict of interest.)

Also, for those interested in wikimedia/wikipedia* and museums, there's going to be a workshop 'for exploring and developing policies that will enable museums to better contribute to and use Wikipedia or Wikimedia Commons, and for the Wikimedia community to benefit from the expertise in museums', Wikimedia@MW2010, at Museums at the Web 2010. There's already a thread, 'Wikimedia Foundation projects and the museum community' with some comments.  I'd love to see the 'Incompatible recommendations' section of the GLAM-Wiki page discussed and expanded.

* I'm always tempted to write 'wiki*edia' where * could be 'm' or 'p', but then it sounds like South Park's plane-rium in my head.]

[I should really stop updating, but I found Seb Chan's post on the Powerhouse Museum blog, Why Flickr Commons? (and why Wikimedia Commons is very different) useful, and carlstr summed up a lot of the issues neatly: "One of the reasons is that Flickr is a package (view, comment search aso). WC is a archive of photos for others to use. … I think Wikipedia/Wikimedia have potential for the museum sector, but is much more complex which can be deterrent.".]