▶ Crowdsourcing for Academic, Library and Museum Environments

"Citizen Science for the Digital Humanities"

Workshop Organisers: Victoria Van Hyning, Zooniverse, University of Oxford and Sarah De Haas, Google


This course will enable participants to experience crowdsourcing in microcosm all the way from project conception to launch to data analysis. It will be a hands on and fast paced course, but there will be plenty of time to reflect on the process of setting up and sustaining a crowdsourcing project.

Participants will come to the course prepared with a project idea and some sample data, e.g. 50 images of objects or books in their museum or library collection or an academic research dataset. It is absolutely essential that participants come prepared with their dataset, and that this dataset is coherent. This means that the same string of questions can be applied to each image. For example, a dataset of medieval manuscripts might be processed in the following steps:

  1. Are there any illuminations on this manuscript image? Yes/no;
  2. If yes, draw a box around the illuminations;
  3. A dropdown list titled 'What appears in the illumination?' Choices could include: animals, humans, plants, text, fantastical creatures, music, etc.

On Monday participants will share and develop their ideas for their projects, and hear from crowdsourcing experts, including 'Old Weather' Project Investigator, Philip Brohan, of the Met Office. We will have the opportunity to 'storyboard' or 'wireframe' projects, so participants may want to print out a few images before arriving from their data, in order to draw on them and develop them.

By Tuesday they will have uploaded their data to Zooniverse's new Panoptes DIY crowdsourcing site, and launched a beta project that they will use for the rest of the week. Participants will have the opportunity to pitch their project to fellow coursemates, and try to generate interest in their project in order to gain experience in attracting a crowd and communicating the significance of their research or collection. By the end of the week the group will use data generated by their project (or back up sample data) in various data refinement and visualisation tools in order to learn the basics of how to manage and analyse their data.

This course will be of particular interest to academics, librarians and museum colleagues who see the potential for crowdsourcing to expedite data extraction from non-machine readable collections. The Panoptes system will be particularly useful for metadata extraction projects and datasets that require a basic decision tree (yes/no answers and dropdown menus) but will not be able to support transcription at this time. Examples of the kinds of data extraction and workflows that will be supported include: http://www.penguinwatch.org; http://www.milkywayproject.org/.

The workshop will be run by Dr Victoria Van Hyning, Digital Humanities Project Lead at Zooniverse.org (University of Oxford), and Sarah de Haas, a technical specialist from Google with a background in humanities who will help bridge the gap between humanities and technical skills.


Times Monday 20 July
Tuesday 21 July
Wednesday 22 July
Thursday 23 July
Friday 24 July
11:00 - 12:30
Introduction to crowdsourcing and project design

Introductions Victoria Van Hyning and Sarah De Haas
Participants will have a chance to speak about what they hope to get out of the week and the data or project they are working on. Victoria and Sarah will outline the important features of a crowdsourcing project and give an overview of the week ahead.

Old Weather Philip Brohan
Philip Brohan of the Met Office and Zooniverse project 'Old Weather' will speak about his experience of setting up and sustaining 'Old Weather', and how he has used crowdsourced data in his climatological research.
Launching a project

Project Workflows Sarah De Haas and Victoria Van Hyning
Participants will continue to set up their projects with help from Sarah and Victoria. They can try different workflows and write prose for the on-site tutorials that will help volunteers understand what they are being asked to do and how to classify.
Drumming up participating, generating data, cleaning data

Nurturing the Crowd Victoria Van Hyning
Victoria will present on the theme of nurturing the crowd through the use of social media, outreach events, academic/expert engagement and forums. This session will also cover funding and resourcing. During the morning session, participants will classify on one another's projects and use the morning coffee break to convince and encourage people in other workshops to classify on their project. The goal is for participants to gain some experience of pitching their projects.
Analyzing data

Maintenance and Visualization Sarah De Haas and Shreenath Regunathan
Sarah and Shreenath will discuss strategies for cleaning and manipulating data and will walk everyone through table maintenance and data visualization.
Sustaining crowds and projects through communication of results

Publishing data Sarah De Haas and Victoria Van Hyning
Sharing, presenting, and publishing your data. Sarah and Victoria will lead a discussion on how to publish your crowdsourced data and make it usable and available to both academic and general interest audiences.
14:00 - 17:30 (inc. break)
Zooniverse project builder Victoria Van Hyning , Sarah De Haas , and Philip Brohan
Participants will be introduced to the Zooniverse project builder platform and start setting up their projects. Philip and additional Zooniverse staff will be on hand to help everyone upload their data and get going. The group will discuss how best to shape project workflows and identify the most important data for extraction.
Project Launch Sarah De Haas and Victoria Van Hyning
Participants will decide project titles, 'hooks' and 3-4 sentences for their 'about' page, which will explain the project and its importance to potential volunteers. Participants will continue to set up their projects, and will launch them by the end of the day.
Preparing for Analysis Sarah De Haas
Sarah will present on the topic of how to make the information generated by crowdsourced projects useful for analysis and will outline the possibilities for visualization and data use which the group will get to try on Thursday and Friday.
Fusion Tables Sarah De Haas
Participants will gain experience of using fusion tables to store data and run queries.
Participants will have a chance to present their work to the group and discuss what they've learned and how they want to take their work forward. What have they learned, what would they do differently? How can they take crowdsourcing into their institutions or research?

There are 4 individual speakers in this workshop.

  • Philip Brohan
    Met Office Hadley Centre

    Philip Brohan did a PhD in theoretical solid state physics many years ago, and then worked for a while as a nuclear engineer, but since 2002 he has been a climate scientist at the Met Office Hadley Centre in the UK. He spends most of his time trying to find out how the weather of 100 years ago compares to that of today, and to this end he runs the citizen science data rescue project oldweather.org.

  • Sarah De Haas

    Sarah de Haas is a former Medievalist-turned-techie who now spends quite a lot of her time supporting developers working on integrations with Google products. She's an expert in explaining the technical details to anyone and everyone, regardless of their background and expertise.

  • Shreenath Regunathan

    Shreenath Regunathan is part of Google's gTech organisation, the team that works on scaling support for Google's products. He works day in day out on business intelligence for Google's tech teams, and is an expert in understanding and analysing data that is often messy, difficult to interpret, and subject to much passionate debate.

  • Victoria Van Hyning
    Zooniverse, University of Oxford

    Victoria Van Hyning completed her doctoral work at the University of Sheffield, in the department of English Language and Literature, where she held a British Library co-doctoral award. Her work focused on early modern English nuns in exile between 1550 and 1800, and their literary activities. Shortly after completing her doctoral studies she began work at Zooniverse, in Oxford, where she is a Digital Humanities postdoctoral fellow and humanities project lead. She leads several humanities projects, including Science Gossip (http://www.sciencegossip.org/), 'Shakespeare's World' with the Folger Shakespeare Library, and 'Anno.Tate' with Tate Britain.


Workshop Venue: All of your workshop sessions will be in the Evenlode room at IT Services. We'll make sure you know how to get there.

AM and PM Refreshment Breaks: All of your breaks will be in the Course Registration area at IT Services. Please go directly to IT Services after your lecture each morning.

Lunch Arrangements: Lunch each day will be in the Ruth Deech Building, St Anne's College.

What you need to bring: Please bring 50 or so images on a USB/other storage device or via Dropbox etc. These will form the core dataset that you will be using during the week. These should be .jpg, .png, .gif or .svg files and may not contain the following characters: /, \, : (slashes or colons). Ideally these will be within fair use or not have copyright restrictions

Computer: Computers will be provided, but if you can bring a laptop one you may find it useful. Please see our information about using a laptop at DHOxSS http://dhoxss.humanities.ox.ac.uk/2015/registration.html#LaptopGuidance

Group Colour: Red

Site last updated: 2015-07-15 -- Image Credits -- Contact: events@it.ox.ac.uk