March 2014 - Big UK Domain Data for the Arts and HumanitiesBig UK Domain Data for the Arts and Humanities

Shutterstock 185840144 © Evlakhov Valeriy

12 March 2014 marked the 25th birthday of the web. As you would expect, there was a great deal of coverage online, both in relatively formal reporting contexts (e.g. newspaper interviews with Sir Tim Berners-Lee) and in social media. The approach taken by Nominet (one of the major internet registry companies) was among the most interesting. It published a brief report (The Story of the Web: Celebrating 25 Years of the World Wide Web) and a rather nice timeline of the web’s defining moments. The report, written by Jack Schofield, reminds us that Yahoo! (with that exclamation mark) ‘became the first web giant’ (p. 5); that Netscape Navigator dominated web browsing in the early years, and indeed ‘almost became synonymous with the web’ (p. 5); and that Google has only been part of our lives since 1997, Wikipedia since 2001 (pp. 6, 7). It concludes that ‘The web is now so deeply engrained in modern life that the issue isn’t whether people will leave, but how long it will take for the next two billion to join us’.

All of this is not just nostalgia – it will be impossible for historians to understand life in the late 20th and early 21st century without studying how the internet and the web have shaped our lives, for better and worse. This analysis requires that the web – ephemeral by its very nature – be archived. We have already lost some of our web history. The web is 25 years old, but the Internet Archive only began to collect website snapshots in 1996, that is, 18 years ago. The Institute of Historical Research launched its first website (then described as a hypertext internet server) in August 1993, but it was first captured by the Wayback Machine only in December 1996. At the time of writing, it has been saved 192 times, with the last capture occurring on 30 October 2013. Without the work of the Internet Archive, and now national institutions such as the National Archives and the British Library in the UK, we would not have any of this data. Researchers and web archivists can work together to ensure that in 2039, we will have 50 years’ worth of primary source materials to work with.

On 26 February we held a very successful half-day workshop on web archiving and research. Despite the blue skies and sun over London – something almost lost to living memory – about 40 people took part in the event.

The Principal Investigator, Jane Winters, introduced the day by emphasising how keen we are to receive applications for our bursaries. A strong focus of the whole workshop was to explain what could be done with a web archive in terms of providing evidence for researchers (its pitfalls as well as its benefits), to explain about the bursaries we are offering, and to answer any questions from potential applicants.

Peter Webster of the British Library then talked through the various incarnations of the UK web domain archive that he and his colleagues curate, as well as the progress made on tools and an interface to the ‘dark archive’ produced by a previous project, AADDA. Peter enlivened his talk with some examples from his own research, using the web evidence of the furore created by the former Archbishop of Canterbury’s 2008 comments on sharia law.

Josh Cowls of the Oxford Internet Institute gave a taste of some of the work the OII is doing on mapping the UK web domain’s history, by, for example, analysing the way links between domains such as ac.uk and co.uk have changed over time.

Many of the researchers who took part in the AADDA project attended the workshop and one of them, Richard Deswarte, described the research he had done for that project, looking at Euroscepticism. Richard then asked other members of the research group to describe their own experiences: all of these seemed to follow a similar trajectory of initial uncertainty, followed by great excitement about the possibilities of web archives as a research tool, and finally some recalibrating of expectations as the technical impediments became apparent.

Suzy Espley and Tom Storrar of the National Archives gave an introduction to the work the TNA is doing in archiving the UK government webspace. The contrast between this archive and the UK domain archive is interesting: we might think of the former as narrow and deep and the latter as broader and shallower. The TNA has been longer in developing its archive and offering an interface to the public, and it was encouraging to learn that it is now accessed 20 millions times a month: proof that there is a great appetite for web archives.

The final speaker was Niels Brügger, who had come all the way from Aarhus to give our keynote presentation. Niels is our consultant on the project and, as a founder member of the RESAW project (which seeks to foster the study of national web archives), was ideally placed to address a room full of researchers. Niels explained that a complete copy of a national web archive is an impossibility: it is constantly changing in all dimensions. For example, if an archival copy of a web page is taken at a particular moment, is it necessary to have archival copies of everything that page linked to? But these pages, if archived, may be taken at different times from the page of origin. Niels raised many other interesting questions, which represent not so much hindrances to web archive research as things that must be borne in mind by the researcher.

We finished the afternoon with a wide-ranging discussion, and a final encouragement from the organisers to consider applying for a bursary. But the bursaries are by no means restricted to those who attended the workshop, so check out the link above if you may be interested in applying. Applications close on 25 April 2014.

Big UK Domain Data for the Arts and Humanities

Monthly Archives: March 2014

Twenty-five years of the web

Our first workshop