Project meeting, 18 June 2014, minutes

Jonathan Blaney [minutes] (JB), Josh Cowls (JC), Ralph Schroeder (RS), Jason Webber (JMW), Peter Webster (PW), Jane Winters (JW)

Apologies: Helen Hockx-Yu, Eric Meyer

1 Minutes of the previous meeting

Accepted as a correct record.

2 Matters arising

2.1 (2)

Details of the collaboration agreement are being finalised.

2.2 (7)

Invitations to the advisory board have been sent; some have been accepted and some are pending.

3 Institutional updates

3.1 OII

JC attended a web archiving workshop in Aarhus. It was clear that the bursary outcomes from our project are going to be valuable as practical examples of work with web archives: the work of others is more theoretical. Our dataset is also very broad and it looks like the project will be well placed within a European context. There was a talk by Bill Thompson, Head of Partnership Development, BBC Archives.

Eric Meyer is currently at a Harvard conference. It was agreed that it would be good to know how our project relates to general web archiving issues.

JC said that the OII has submitted a chapter to a new Internet Studies Handbook, due in September 2015. They will find out if the chapter is accepted at the beginning January.

3.2 BL

PW is talking to eight of the bursary holders this week; he has a set of user requirements and will go through this with them and try to prioritise development. Researcher expectations need to be managed: several things were raised at the induction meeting which would be impossible to do. There are two broad use cases: analytic work across the whole dataset and isolation of a corpus to analyse. We need a method for users to log in and search using filters and then to freeze that set, or to compile a list of hosts or domains they are interested in and freeze that – i.e. one is search driven and one is node based. Some researchers will be keen to share corpora with other users.

3.3 IHR

JW and JB have a meeting planned with Chris Fryer and his manager, to discuss his research project and explore links with Parliament.

4 Indexing

The indexing is no longer being done on the Hadoop cluster; it is now being done on four servers. About 50% is now done and 5% is being indexed at a time – if this chunk goes it takes 30-36 hours. This is a world-leading endeavour: no one before has done text indexing on this volume of WARC files.

5 Animated films

We committed to making explanatory films in our application. The budget is £3,000, which is not nearly enough to employ a professional animator. There have been talks with a former Central St. Martins student who is keen to do the work. She will story-board before we commit to paying her. Two films of two minutes and one of one minute are planned: what a web archive is, the different datasets the BL has, and a history of the web.

6 Communication and publicity

It was agreed that we keep a log of researcher contacts, so we don’t all think that someone else is in touch with a particular researcher.

It was agreed that we should encourage the researchers to use the group email to communicate between themselves if they would like to.

It was agreed that the project blog should be kept active and various future posts were discussed.

7 Conference attendance and travel budget

The group discussed an article by Jaspreet Singh about the methodology of the ALEXANDRIA project and the possibility of him meeting the team.

The group agreed that the next IIPC, almost certainly to be held in San Francisco; if possible we should send someone from the IHR and the OII to this, but if not then one from either institution.

8 Being Human festival

This is an AHRC-funded, SAS-run festival of the humanities that will take place over nine days in November. It is new and so the turnout is unknown. We could invite people to search for themselves in the web archives. The group agreed that this was a good idea and will touch on notions of the “right to be forgotten”. If a number of project team members can come (it will probably be on a Saturday) that would be great.

9 December conference

We will have a two-day event, 3-4 December. One day will be the RESAW event and the other will be the project workshop. The RESAW event is to remain open only to members but RESAW members can come to the workshop.

10 The book

It was reported that negotiations with prospective publishers are ongoing.

11 Meeting schedule

The group agreed that, as far as possible, we could coordinate project meetings with the bursary holders’ meetings, which are on the 6-week schedule envisaged for project meetings at this stage. If any of these are inconvenient they can be replaced by an ad hoc Skype meeting.

12 Any other business

There was none.