This course will enable participants to experience crowdsourcing in microcosm all the way from project conception to launch to data analysis. It will be a hands on and fast paced course, but there will be plenty of time to reflect on the process of setting up and sustaining a crowdsourcing project.
Participants will come to the course prepared with a project idea and some sample data, e.g. 50 images of objects or books in their museum or library collection or an academic research dataset. It is absolutely essential that participants come prepared with their dataset, and that this dataset is coherent. This means that the same string of questions can be applied to each image. For example, a dataset of medieval manuscripts might be processed in the following steps:
- Are there any illuminations on this manuscript image? Yes/no;
- If yes, draw a box around the illuminations;
- A dropdown list titled 'What appears in the illumination?' Choices could include: animals, humans, plants, text, fantastical creatures, music, etc.
On Monday participants will share and develop their ideas for their projects, and hear from crowdsourcing experts, including 'Old Weather' Project Investigator, Philip Brohan, of the Met Office. We will have the opportunity to 'storyboard' or 'wireframe' projects, so participants may want to print out a few images before arriving from their data, in order to draw on them and develop them.
By Tuesday they will have uploaded their data to Zooniverse's new Panoptes DIY crowdsourcing site, and launched a beta project that they will use for the rest of the week. Participants will have the opportunity to pitch their project to fellow coursemates, and try to generate interest in their project in order to gain experience in attracting a crowd and communicating the significance of their research or collection. By the end of the week the group will use data generated by their project (or back up sample data) in various data refinement and visualisation tools in order to learn the basics of how to manage and analyse their data.
This course will be of particular interest to academics, librarians and museum colleagues who see the potential for crowdsourcing to expedite data extraction from non-machine readable collections. The Panoptes system will be particularly useful for metadata extraction projects and datasets that require a basic decision tree (yes/no answers and dropdown menus) but will not be able to support transcription at this time. Examples of the kinds of data extraction and workflows that will be supported include: http://www.penguinwatch.org; http://www.milkywayproject.org/.
The workshop will be run by Dr Victoria Van Hyning, Digital Humanities Project Lead at Zooniverse.org (University of Oxford), and Sarah de Haas, a technical specialist from Google with a background in humanities who will help bridge the gap between humanities and technical skills.
There are 4 individual speakers in this workshop.
Met Office Hadley Centre
Philip Brohan did a PhD in theoretical solid state physics many years ago, and then worked for a while as a nuclear engineer, but since 2002 he has been a climate scientist at the Met Office Hadley Centre in the UK. He spends most of his time trying to find out how the weather of 100 years ago compares to that of today, and to this end he runs the citizen science data rescue project oldweather.org.
Sarah De Haas
Sarah de Haas is a former Medievalist-turned-techie who now spends quite a lot of her time supporting developers working on integrations with Google products. She's an expert in explaining the technical details to anyone and everyone, regardless of their background and expertise.
Shreenath Regunathan is part of Google's gTech organisation, the team that works on scaling support for Google's products. He works day in day out on business intelligence for Google's tech teams, and is an expert in understanding and analysing data that is often messy, difficult to interpret, and subject to much passionate debate.
Victoria Van Hyning
Zooniverse, University of Oxford
- Workshop: ▶ Crowdsourcing for Academic, Library and Museum Environments
- Lecture: 3a: Crowdsourced Text Transcription
Victoria Van Hyning completed her doctoral work at the University of Sheffield, in the department of English Language and Literature, where she held a British Library co-doctoral award. Her work focused on early modern English nuns in exile between 1550 and 1800, and their literary activities. Shortly after completing her doctoral studies she began work at Zooniverse, in Oxford, where she is a Digital Humanities postdoctoral fellow and humanities project lead. She leads several humanities projects, including Science Gossip (http://www.sciencegossip.org/), 'Shakespeare's World' with the Folger Shakespeare Library, and 'Anno.Tate' with Tate Britain.
Workshop Venue: All of your workshop sessions will be in the Evenlode room at IT Services. We'll make sure you know how to get there.
AM and PM Refreshment Breaks: All of your breaks will be in the Course Registration area at IT Services. Please go directly to IT Services after your lecture each morning.
Lunch Arrangements: Lunch each day will be in the Ruth Deech Building, St Anne's College.
What you need to bring: Please bring 50 or so images on a USB/other storage device or via Dropbox etc. These will form the core dataset that you will be using during the week. These should be .jpg, .png, .gif or .svg files and may not contain the following characters: /, \, : (slashes or colons). Ideally these will be within fair use or not have copyright restrictions
Computer: Computers will be provided, but if you can bring a laptop one you may find it useful. Please see our information about using a laptop at DHOxSS http://dhoxss.humanities.ox.ac.uk/2015/registration.html#LaptopGuidance
Group Colour: Red