Project progress, an update - Big UK Domain Data for the Arts and HumanitiesBig UK Domain Data for the Arts and Humanities

Josh Cowls reflects on recent developments and our goals towards the end of the project:

We are already well past the half-way mark of the project, and exciting new developments mean that our eleven researchers are well on their way to producing high-quality humanities research using the massive UK Web Domain Dataset.

The project team meets with the researchers on a regular basis, and these meetings always involve really constructive dialogue between the researchers accessing and using the data, and the development team at the British Library who are improving the interface of the archive all the time.

Our most recent meeting in September was no exception. We first got a brief update from all the researchers present about how their work was taking shape. This led seamlessly into a wider discussion of what researchers want from the interface. The top priority was for the creation of accounts for each individual user, enabling users to save the often-complex search queries that they generate. Another high priority was the ability to search within results sets, enabling more iterative searching.

Among the other enhancements suggested by the researchers were a number of proposed tweaks to the interface. One suggestion to save researchers time was for a snippet view on the results page, showing the search term in context – meaning researchers could skip over pages clearly irrelevant to their interest. On the other hand, it was not felt that URLs should necessarily appear on results pages.

Other requested tweaks to the interface included:

An option to choose the number of search results per page and to show more results per page by default
The ability to filter results from advanced as well as simple search queries
Tailoring of the ‘show more’ feature depending on the facet
A ‘show me a sample’ feature for large amounts of results, with a range of sampling methods, including a random sample option.

As well as these interface issues, the conversation also focussed on more academic questions, especially in regard to how results should be cited from the dataset. A ‘cite me’ button was suggested, which would allow a quick way of citing results, and similarly, when viewing individual results on the Internet Archive, an outer frame could include citation details. But of course, exactly what form these citation details should take raised other questions: should the British Library be cited as the provider of the data, or should the Internet Archive as the original collector? How should collections of results be cited, given that the British Library’s search functionality generated the results?

Inevitably, some of these questions couldn’t be answered definitely at the meeting, but the experience shows the value of involving researchers – who are able to raise vital questions from an academic perspective – while the development of the interface is still in progress. Since the meeting, many of the proposed changes have already been implemented – including, crucially, the introduction of log-ins for researchers, enabling the preservation and retrieval of search queries. The researchers are encouraged to bring more requests to our next meeting, at the British Library next week. From then, the pace of the project will accelerate still further, with a demo of the project to the general public at the AHRC’s Being Human Festival in November, and the ‘Web archives as big data’ conference in early December, when the researchers will present their findings.