I was very grateful to make it to the 15th International Digital Curation Conference in Dublin from 17th-19th February, ahead of the current lockdown. A truly international spread of 300 researchers, Research Data Managers, specialist Librarians, Archivists and IT Specialists (see the Conference Twitter picture here) came together to discuss collective curation of data and how institutions support data management. The first day was for workshops and I attended ‘Ten things you can do to support FAIR data culture’. The FAIR principles are that data should be Findable, Accessible, Interoperable and Reusable and the speakers discussed existing training resources and how we could share them, potentially through a Zenodo community.
After a quick walk around Dublin in the daylight, there was an excellent walking tour by night led by a very knowledgeable historian. We saw the key sights and I learnt many interesting facts about the city including some surprising one’s such as how women were only allowed to study at Trinity College from 1904.
The second day was the start of the Conference, kicked off by a very interesting keynote from Francine Berman which helped me understand the problems with the amount & privacy of data that is being collected from the estimated 20 billion ‘things’, with some real world examples such as self-driving cars – all cars are expected to be self-driving by 2050! It was then onto the sessions (programme here), an interesting statistic from Rebecca Grant’s talk was that it has been estimated that 500,000 data stewards will be needed in the next decade to manage the volume of data that is being created. In the lightning talks session I gave my talk on ‘Providing Software Support to Enable Research: From Feral Parakeets to the Times Digital Archive’ where I discussed three case studies of how our Research Software Group helped researchers with their projects, from enabling data input by the public to helping with data analysis (slides available here) – for more case studies see their Annual Reports.
A very useful guide to data documentation was highlighted by Mari Elisa Kuusniemi and an interesting approach to encouraging full data entry into a University repository called the ‘Tombstone protocol’ was discussed by Alexander Bell (slides available here). It was interesting to hear from James Wilson from UCL that they are looking to develop anonymisation training for their researchers as we at the University of Birmingham are too, hopefully we can share resources.
After an excellent networking Conference dinner (we were separated according to our drinks preference :-)), it was onto Wednesday’s sessions. Kostas Glinos from the European Commission discussed the importance of open science in gaining trust from the public and how to appropriately reward researchers throughout their career for making science open. It has been estimated that the opportunity cost of having research data that is not FAIR is €10.2 billion, however it is only seen as the 10th most important aspect of Academic work for a research career.
The lightning talks on Ethics and Appraisal were useful as again it highlighted the need for researchers to know more about how to anonymise. David Fearon’s talk on ‘Supporting Identifier Protection for Sharing Human Subject Data’ gave some useful description on how to anonymise data so that it can be released and posed the question, what is an acceptable risk level? How to decide when to delete data is a tricky question but becoming an increasingly important one as highlighted by Paul Stokes from Jisc. It is predicted that we will run out of capacity to build data storage with 175ZB of data expected to be produced by 2025 and only 11.7ZB of expected storage capacity, therefore we really need to assess what research data needs to be kept.
It was interesting to hear from Zosia Beckles at the University of Bristol on how they are extending their support for publishing sensitive research data by providing a 0.5FTE post dedicated to the role, their bootcamp online training looks to be very useful and in the future will be including videos from key stakeholders working with sensitive data. At this point I had to leave for my flight home but after the Conference I managed to read the winning talk/paper on ‘Piloting a Community of Student Data Consultants’ which describes an innovative scheme at Virgina Tech called DataBridge which aims to upskill undergraduates in data science and provide additional trained resource for projects requiring data science skills – see the DataBridge webpage for more information.
Another useful part of the Conference was finding out about the Research Data Alliance, there are some working groups that sound useful to us and a positive (for me!) of the current coronavirus situation is that the RDA 15th Plenary meeting due to be held in Australia was held online instead, so I was able to attend one of the sessions on engaging with researchers. Due to leaving early, I didn’t find out until later but the next IDCC will be organised in collaboration with the RDA Plenary in Edinburgh in April 2021, so I hope to make the next one to keep up to date with developments and meet more people working in this area.