A wolf among Librarians in Barcelona

Published: Posted on

It has taken me a while to write this due to our team being in demand recently but back in February, I attended the International Digital Curation Conference held in Barcelona. I was there to present our poster (produced in collaboration with Library Services), on how we help our researchers manage their data and provide computing infrastructure to enable their research.

The view from my room 22 floors up in the hotel was excellent if a little surprising – surrounded by modern high-rise buildings it’s not what I had expected in Barcelona but the city centre itself was a few miles away, accessible via the Metro.

The shadow is the hotel itself! The red asterisk is where I explored later on.

I was very pleased to see the first keynote speaker Sabina Leonelli (University of Exeter) was from the Life Sciences field, as I was previously a researcher in that field myself. Sabina discussed the importance of sharing data in the field of plant sciences as the results of the research influence crop production and food security throughout the world. She raised an important point that data infrastructures and repositories need to be trustworthy and user-orientated for researchers to want to use them. Curation of online data needs to be context-specific and maintained long-term to enable data re-use. Plant traits are often described differently depending on location which can make it hard to identify linked data. Researchers (& funders) need to take account of data stewardship when budgeting in grant proposals and provide funds for maintenance of data storage beyond the lifetime of the project. Sabina advocates for Data Stewards (data specialists within the field) to be part of the research which then allows services to be developed which reflect the needs of the researchers – something we in Advanced Research Computing are keen to do hence why we try to interact with our researchers as much as possible.

On the theme of data repositories, Amy Koshoffer and Amy Neeser’s advice was to use domain-specific repositories so that all specific metadata is included but to keep a copy in your institutional repository as a backup. The University is just about to launch its own repository which will provide DOIs (Digital Object Identifiers) called eData (edata.bham.ac.uk). Dennis Wehrle from the University of Freiburg had investigated research datasets in 92 repositories and found a minimum of 145 file formats out of which 103 were defined as having a low probability of successful preservation eg. *.doc (plain text files have a high probability). The number of different filetypes depended on the discipline but the challenge is going to be preserving these varying datasets for the future when file formats can become obsolete.

Shelley Stall from the American Geophysical Union highlighted a scary story which emphasised the need for researchers to backup their data. Researchers had to retract a paper that had been published in Science and the results widely publicised due to its conclusions on the effects of microplastic particles on fish, as the raw data could not be produced and hence checked. The only copy of the data was stored on a laptop which had been stolen: http://science.sciencemag.org/content/354/6317/1242.1

I presented my poster (below) in the 1-minute madness session, a great way to work out which of the 31 posters to go and see but a little daunting when presenting to 200 or so attendees! There was quite a bit of interest in the following poster session as the central European countries seem to be a couple of years behind us, with their Funders only just starting to mandate the use of data management plans (DMPs). They were therefore interested to see how we developed our infrastructure to meet the demands of our researchers. We also seem to be ahead of the game in the UK regarding PhD students having to write DMPs – a policy which other Data Managers are currently trying to get their Universities to enforce.

On the second day, Rebecca Grant from Nature journals discussed the data availability statements that are now required when publishing with them. The journals prefer data deposited in repositories rather than provided as supplementary information and will chase up authors who don’t supply data on request. In the afternoon, I chose to go to a Birds of a Feather session on GDPR (General Data Protection Regulation) where we had to introduce ourselves and us Computing people make apologies for our presence in the midst of all the Librarians! It was interesting to hear of the challenges that digital curators and researchers are facing throughout the world regarding GDPR. We heard that in the States you can ‘buy’ sensitive data/images from hospitals leading to requests to ‘buy’ sensitive data from data repositories, but getting funding for research involving personal data is very difficult.

After the conference, I managed to get into the centre of Barcelona to discover what the interesting buildings were that I could see from the hotel and found that it was the Olympic Stadium from when Barcelona hosted in 1992 (which I remember watching!).

Left: The red asterisk is where my hotel is. Right: The Olympic Stadium

So, to conclude, I learnt a lot about the field of data management and most importantly by going out to network with Data Managers from other countries we found that the University of Birmingham is further ahead in respect to data management than we thought which is always good to know 🙂

References

  • Leonelli et al. 2017. Data management and best practice for plant science. Nature Plants, 3, 17086.
  • Koshoffer et al. 2018. Giving datasets context: a comparison study of institutional repositories that apply varying degrees of curation. IDCC18 Research Paper.
  • Wehrle & Rechert 2018. Are research data sets FAIR in the long run? IDCC18 Research Paper.
  • Stall et al. 2018. Enabling FAIR Data in the Earth and Space Sciences. IDCC18 Research Paper.
  • Grant & Hrynaszkiewicz 2018. The impact on authors and editors of introducing Data Availability Statements at Nature journals. IDCC18 Research Paper.