▶ Humanities Data: Curation, Analysis, Access, and Reuse

"Managing modern data for academic research"

Workshop Organisers: Megan Senseney, Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign and Kevin Page, Oxford e-Research Centre, University of Oxford

Note: This workshop expects you to bring your own laptop. Please see our Laptop Guidance on the registration page for more information.


Humanists have data. Moreover, advances in the methodologies and approaches of digital humanities research have exposed the importance of maintaining research data and digital information in a manner that preserves its meaning and usefulness. Data curation is the active and ongoing management of data through its lifecycle of interest. Purposeful curation provides the foundation for a range of related activities from analyzing and visualizing research data to promoting access and reuse across a broader scholarly community. This workshop will provide a strong introductory grounding in data concepts and practices with an emphasis on humanities data curation. Sessions will cover a range of topics, including data organization, data modeling, big data and data analysis, and workflows and research objects. Case studies will include examples from the HathiTrust, EEBO-TCP, and BUDDAH.

The program is aimed at humanities researchers — whether traditional faculty or alternative academic (alt-ac) professionals — but may also be of interest to librarians, archivists, cultural heritage specialists, other information professionals, and advanced graduate students. Sessions will be led by experts from the iSchool at Illinois' Center for Informatics Research in Science and Scholarship and the HathiTrust Research Center as well as Oxford University's Bodleian Library, Oxford e-Research Centre, and Oxford Internet Institute.


Times Monday 20 July
Tuesday 21 July
Wednesday 22 July
Thursday 23 July
Friday 24 July
The workshop will begin by providing conceptual frameworks for considering the role of data curation in humanities research with an emphasis on information organization and representation. Students will also gain hands-on experience with tools like OpenRefine for profiling, processing, and normalizing different kinds of data.
The second day will build upon the theme of data representation with a focus on contextualizing digital objects and accounting for data provenance. Participants will gain hands-on exposure to contextual data modeling and learn about how strong data models support curation, access, and reuse. The final session will focus on knowledge representation in a semantic web context using the Web Ontology Language (OWL).
Digital humanities work often involves interdisciplinary collaborations, and skills and knowledge that is translated from technical fields, but also from fields outside the humanities. On this day, several social scientists from the Oxford Internet Institute will lead sessions that discuss big data in the context of the humanities and how methods developed in the social sciences can help digital humanities practitioners think about their data. The day's emphasis is on accessing and analysing large data sets over time.
The fourth day of the workshop will cover data workflows, ranging from the personal to the institutional, followed by a discussion of humanities research objects. The emphasis here is on capturing and documenting the complexity of research data and the research process as a curatorial endeavor that provides added value and encourages reuse.
The final day of the workshop engages participants in case studies from the HathiTrust Research Center and Early English Books Online. Participants will explore open research questions relating to building heterogeneous scholarly research collections from a variety of sources and analyzing the content of their collections within a non-consumptive research paradigm. The final session will close with a reflective discussion about how participants are already curating their own data and what methods and techniques participants can apply to research data moving forward.
11:00 - 12:30

Introduction to Humanities Data Curation: Conceptual Frameworks Allen Renear and Andrea K. Thomer
Semantically Framing Digital Objects with Context and Provenance Neil Jefferies
Big Data and the Humanities Ralph Schroeder and Laird Barrett
Personal Research Workflows David De Roure
Case Study: HathiTrust Research Center J. Stephen Downie and Megan Senseney
14:00 - 17:30 (inc. break)
Metadata Normalization with OpenRefine Megan Senseney and Andrea K. Thomer

Information Organization Allen Renear
Contextual data modeling with RDF using CAMELOT Neil Jefferies and Tanya Gray Jones

Ontologies: OWL and OWL2 Neil Jefferies and Tanya Gray Jones
UK Web Archives and the 'Big UK Data Arts and Humanities (BUDDAH) Project' Josh Cowls and Jason Webber

Network Analysis Scott Hale
Institutional Workflows: the Oxford Research Archive Sally Rumsey

Research Objects in the Humanities Kevin Page and Terhi Nurmikko-Fuller
Case Study: Early English Books Online Text Creation Partnership (EEBO-TCP) Pip Willcox

Workset Creation for Scholarly Analysis J. Stephen Downie , and Kevin Page , Megan Senseney

Closing Discussion: Sharing What Works J. Stephen Downie , Megan Senseney , and Andrea K. Thomer

There are 16 individual speakers in this workshop.

  • Laird Barrett
    Taylor & Francis / Oxford Internet Institute, University of Oxford

    Laird has a background in English Literature and studied academic research and communication online with Ralph Schroeder as an MSc student at the Oxford Internet Institute. He now works for Taylor & Francis journals, helping to develop electronic products. He specifically works on helping to develop the open access publishing program, as well as on archive products and facilitating text-and-data mining.

  • Josh Cowls
    Oxford Internet Institute, University of Oxford

    Josh Cowls is a Research Assistant at the Oxford Internet Institute, University of Oxford, where he completed his MSc Social Science of the Internet in 2013. Josh has worked on a range of projects exploring the impact of large, diverse datasets on research and policy-making, and his work has appeared in Policy & Internet and FirstMonday. Since March 2014 Josh has worked on the AHRC project, 'Big UK Domain Data for the Arts and Humanities'.

  • David De Roure
    Oxford e-Research Centre, University of Oxford

    David De Roure is Professor of e-Research at the University of Oxford, where he directs the multidisciplinary e-Research Centre. Focused on advancing digital scholarship, David has conducted research across disciplines in the areas of social machines, computational musicology, Web Science, social computing, and hypertext. He is a frequent speaker and writer on digital scholarship and the future of scholarly communications, and advises the UK Economic and Social Research Council in the area of Social Media Data and realtime analytics.

  • J. Stephen Downie
    Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign

    J. Stephen Downie is a professor and the associate dean for research at the Graduate School of Library and Information Science, University of Illinois. Dr. Downie conducts research in music information retrieval. He was instrumental in founding both the International Society for Music Information Retrieval and the Music Information Retrieval Evaluation eXchange.

  • Tanya Gray Jones
    Bodleian Libraries, University of Oxford

    Tanya Gray Jones is a Digital Engineer working for the Bodleian Libraries and is currently working to define a semantic data model for the Bodleian Digital Library. She is a contributor to the Cultures of Knowledge project, working on various technical aspects including the definition of a semantic data model and the development of a semantically-enriched input form.

  • Scott Hale
    Oxford Internet Institute, University of Oxford

    Scott A. Hale is a Data Scientist at the Oxford Internet Institute of the University of Oxford, UK. He develops and applies techniques from computer science to research questions in the social sciences. He is particularly interested in the area of human-computer interaction, the spread of information between speakers of different languages online, and collective action/mobilization.

  • Neil Jefferies
    Bodleian Libraries, University of Oxford

    Neil Jefferies is Head of R&D for Bodleian Digital Library Systems and Services at Oxford, guiding the development of digital preservation services at the Bodleian covering both traditional library materials and research data in all its forms. He is a scientist by training but has been working with internet technologies for nearly 20 years, mostly commercially - first website was Snickers/Euro'96! He is Technical Director of "Cultures of Knowledge", an international collaborative project launched in 2009 "to reconstruct the correspondence and social networks central to the revolutionary intellectual developments of the early modern period".

  • Terhi Nurmikko-Fuller
    Oxford e-Research Centre, University of Oxford

    Terhi Nurmikko-Fuller is a postdoctoral Research Associate at the University of Oxford e­-Research Centre. Her research involves the use of Linked Data and semantic technologies to support and diversity scholarship across a range of topics in the Digital Humanities.

  • Kevin Page
    Oxford e-Research Centre, University of Oxford

    Dr. Kevin Page is a researcher at the University of Oxford e­-Research Centre. His work on web architecture and the semantic annotation and distribution of data has, through participation in several UK, EU, and international projects, been applied across a wide variety of domains including sensor networks, music information retrieval, clinical healthcare, and remote collaboration for space exploration. He is principal investigator of the Early English Print in HathiTrust (ElEPHãT) and Semantic Linking of BBC Radio (SLoBR) projects, and leads Linked Data research within the AHRC Transforming Musicology project.

  • Allen Renear
    Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign

    Allen Renear is Dean of the Graduate School of Library and Information Science (GSLIS) at the University of Illinois. Professor Renear has been a GSLIS faculty member since 2001, serving a three-year term as associate dean for research before becoming Dean. Prior to coming to GSLIS Renear was Director of the Scholarly Technology Group at Brown University. His other academic leadership roles include serving as president of the Association for Computers and the Humanities, Director of the Brown University Women Writers Project, Chair of the Open eBook Publication Structure Working Group (now ePUB/IDPF), and in various roles in the Text Encoding Initiative. His research and teaching are in the areas of data curation, scientific publishing, digital humanities, and the conceptual foundations of information systems. His research projects are associated with the GSLIS Center for Informatics Research in Science and Scholarship.

  • Sally Rumsey
    Bodleian Libraries, University of Oxford

    Sally Rumsey is the Digital Research Librarian at the Bodleian Libraries, University of Oxford. Sally manages the Oxford University Research Archive (ORA), a sustainable repository for research publications at the University of Oxford and is Senior Programme Manager for the University's Open Access Oxford Programme. She is leading the Bodleian team developing data archiving services to support research data management for Oxford. She liaises with colleagues across the University on matters related to digital scholarly outputs and matters of interest to the libraries around research information management.

  • Ralph Schroeder
    Oxford Internet Institute, University of Oxford

    Ralph Schroeder is Professor and director of the Master's degree in Social Science of the Internet at the Oxford Internet Institute. His recent books are Rethinking Science, Technology and Social Change (Stanford University Press, 2007) and co-authored with Eric T. Meyer, is Knowledge Machines: Digital Transformations of the Sciences and Humanities (MIT Press 2015).

  • Megan Senseney
    Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign

    Megan Senseney works as Senior Project Coordinator for the Center for Informatics Research in Science and Scholarship at the University of Illinois Graduate School of Library and Information Science where she also graduated with a Master of Science in 2008. Her recent projects and research interests focus on data curation issues in the digital humanities.

  • Andrea K. Thomer
    Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign

    Andrea K. Thomer is a PhD student at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, and a research associate at the Center for Informatics Research in Science and Scholarship. Before receiving her MLIS (Specialization in Data Curation) from Illinois in 2012, she worked as an excavator at the Page Museum at the La Brea Tar Pits. Her research interests include biodiversity and natural history museum informatics; long-term database curation, particularly in a research or museum setting; and bringing information science methods to the field of biology (and vice versa).

  • Jason Webber
    The British Library

    Jason is the Web Archiving Engagement and Liaison Manager at the British Library and manages the communication and partnerships liaison for the UK Web Archive on behalf of all of the UK Legal deposit libraries. He is also the Program and Communications officer for the IIPC (International Internet Preservation Consortium). Jason has previously managed various digital projects and websites at the Museum of London and the Natural History Museum.

  • Pip Willcox
    Bodleian Libraries, University of Oxford [Co-Director of DHOxSS]

    Pip Willcox is the Curator of Digital Special Collections at the Bodleian Libraries, University of Oxford. With a background in scholarly editing and book history, she is an advocate for engaging new audiences for multidisciplinary scholarship and library collections through digital media. She conceived and ran the Sprint for Shakespeare public campaign and the Bodleian First Folio project. Current projects include Early English Print in the HathiTrust (ElEPHãT)—a linked semantic prototyping project, and SOCIAM: the theory and practice of social machines. Pip serves on the Text Encoding Initiative Board of Directors, and is Co-director of the annual Digital Humanities at Oxford Summer School, convening its introductory workshop strand


Workshop Venue: All of your sessions will be in the Access Grid Room at the Oxford e-Research Centre (OeRC). We'll make sure you know how to get there.

AM and PM Refreshment Breaks: All breaks will be in Atrium, OeRC. Please go directly to the OERC after your lecture each morning.

Lunch Arrangements: Lunch each day will be in the Atrium, OeRC.

Group Colour: Yellow

Computers: As registrants in the "Humanities Data: Curation, Access, Analysis, and Re-use! workshop, there are a few things you'll need to do in preparation for our various hands-on sessions and activities. Below is a list of software and data requirements.

Please plan to have downloaded and installed all required items before Monday June 20.

If you need help troubleshooting, please contact Megan Senseney (mfsense2@illinois.edu).

General Requirements: All workshop participants are expected to bring their own laptops to the summer school. Please be sure that you have administrative privileges to install software on your machine, and that you are running an up-to-date web browser (preferably Firefox or Chrome). You will also need a PDF viewer (e.g. Adobe Reader or Preview) and a text editor (e.g., Notepad, Text Wrangler, or Sublime).

Required Software Installations:

Optional Software Installations:

Required Datasets:

Optional Background Reading: For those of you who'd like to get a head start on thinking about humanities data, we've provided a few recommended readings below. We won't address these readings directly in any given session, but it's nice to have a shared conceptual framework at the start of the week.

NB: As we continue preparing for the workshop through the month of July, we may identify additional hands-on activities and demonstrations that require additional software and/or datasets. If there are any further requirements, we post these here in the week before the summer school.

Site last updated: 2015-07-15 -- Image Credits -- Contact: events@it.ox.ac.uk