Web archives as big data, 3 December 2014

The web is an integral part of our daily lives, whether we are shopping online, booking cinema tickets, registering to vote or checking whether or not it is going to rain today. It is also of enormous importance to arts and humanities researchers, as the site of digitised historical material, as a primary source in its own right, and as a means of promoting and communicating research to the widest possible audience. It is hard to imagine how you would write the history of the late 20th and early 21st centuries without access to all of this data.

But web archives offer unique challenges for researchers. The data is messy, collected irregularly to varying degrees of completeness, and as it grows will contain multiple duplicates of some web pages. The depth of collection is uneven: some important sites are crawled in their entirety, others to only a shallower level. Different media within a page may not be acquired successfully, so that only partial content is preserved (the scale of the data collection precludes manual checking). Web pages contain many different types of information beyond what might be viewed as their main textual content – links to other sites, advertisements, contact information, and so on – making analysis problematic. As things stand, we have neither the expertise nor the tools to exploit this invaluable resource reflectively.

The ‘Big UK Domain Data for the Arts and Humanities’ project is seeking to address these challenges, working with humanities researchers from a range of disciplines. This conference will showcase the ground-breaking work being undertaken using the archive of UK web space from 1996 to 2010, as well as exploring how the web is archived by institutions such as the British Library, and the ethical implications of working with this kind of data.


9.45 – Registration
10.20 – Introduction and welcome
10.30 – Keynote: Tobias Blanke, King’s College London – title tbc
11.30 – Research showcase panel I

  • Rona Cran, Beat literature in the contemporary imagination
  • Saskia Huc-Hepher, An ethnosemiotic study of London French habitus as displayed in blogs
  • Harry Raffal, The Ministry of Defence’s online development and strategy for recruitment, 1996-2013
  • Marta Musso, A history of the online presence of UK companies

12.30 – Lunch

1.30 – Research showcase panel II

  • Helen Taylor, Do online networks exist for the poetry community?
  • Lorna Richardson, Public archaeology: a digital perspective
  • Richard Deswarte, Revealing British Eurosceptism in the UK web domain and archive
  • Alison Kay, Capture, commemoration and the citizen-historian: Digital Shoebox archives relating to P.O.W.s in the Second World War

2.30 – James Baker, The British Library: a place full of data

3.00 – Tea/coffee

3.20 – Research showcase panel III

  • Gareth Millward, Digital barriers and the accessible web: disabled people, information and the internet
  • Rowan Aust, Responses to institutional crisis online
  • Chris Fryer, The UK Parliament Web Archive

4.00 – Peter Webster, What does the future hold? Archiving the UK web at the British Library

4.45 – Roundtable: The ethics of big data research

5.30 – Close

To register go to the conference page on Eventbrite. Registration is free, but places are limited so early booking is advised. The event is located in the Wolfson Conference Suite, Institute of Historical Research, University of London. Lunch will also be provided.