Research Data Network York – Birmingham Environment for Academic Research

Last week I attended my first Research Data Management (RDM)-related event. It was the 4th in the series run by JISC called Research Data Network and it was held at the University of York where I studied as an undergraduate, so I was keen to see how the University had changed. The answer was ‘a great deal!’ The event was held at the new campus which was only being talked about when I was last there. The fields are now populated with buildings and some lakes that are much more pleasant than the original one on campus as they have been naturalised with plants rather than having concrete edges.

The event was held over two days with many interesting talks and lots of opportunities for discussion within the sessions. It felt much more interactive than the Academic conferences I have been to in the past (in my previous career as a post-doc).

I found the Keynote speaker’s talk (Mark Humphries, University of Manchester) particularly interesting as he talked about being a Researcher in Neuroscience in a ‘Data lab’. The technical difficulties of getting data in the field means that researchers are reluctant to make their data open and usable by others which was termed ‘data worship’. This certainly resonated with me as I used technical techniques myself in my research and remember the many hours in a dark room spent trying to get data. He also outlined the need for that data to produce papers, as all Academics are judged on papers as outputs and they can’t make the data open until they have produced those papers. There really needs to be some other way of judging research output other than via papers.

I’m not sure I agree with the proposition that ‘Researchers don’t care what research is out there already’ – I remember spending a lot of time as a PhD student going through the literature searching for previous relevant experiments and data. I signed up for zetoc alerts so that I could track all new relevant papers that were being published, but I was conscious that there is a lag in publication so someone may be working on the same question at the same time – this is when communication between researchers is important such as attendance and presentations at Conferences. Some journals allow preprint papers to be made available which will reduce the lag between submission and journal publication and allow feedback to the authors to aid manuscript revision. The debate on preprints in Biology is discussed in this WIRED article.

The volume of data produced now by techniques such as patch clamping where 1,000’s of neurons can be tested at once means that there is too much data for one group to analyse. Therefore, there are benefits of sharing the data where multiple groups can work on a large dataset. Indeed, Mark reported that 1 data set had resulted in 9 publications for different groups. The data creators are *usually* credited as middle authors. An interesting question from the audience was ‘Is there a time limit on data useability?’ This is an important point and one which depends on the field and how quickly improvements in techniques are made, but Mark specified they had worked on data that was 10 years old.

Onto the parallel sessions and I had trouble choosing which sessions to attend as they all seemed relevant. I chose Jenny Mitcham’s talk on ‘Archivematica for research data’ where an interesting discussion arose on how to preserve Google Drive documents and what needs preserving. There doesn’t appear to be a Google export format that preserves everything, eg. are comments preserved or not? If not, then how do you know if they have been removed? It can be important to keep a record of versioning as it indicates ownership of data and data discovery. See Jenny’s blogposts on the subject including an update since the RDN event.

I found the session on ‘What I wish I’d known at the start! Lessons learnt the hard way when setting up RDM services’ very useful, as 5 different Universities were covered under the subjects ‘What went well’, ‘What didn’t go so well’ and ‘What I’d do differently’. It was a good benchmarking exercise and there were lots of great ideas to follow up. One area we are definitely ahead is coordination between IT Services and Library Services – a number of institutions seem to have problems communicating with IT and getting interest from them in RDM but our Research Computing team is heavily involved in RDM and we provide the resilient storage needed for researchers working with data-intensive projects.

An interesting point that arose was that Cambridge had researchers that saw ‘open data’ in a negative light and so RDM work had to be re-branded to avoid the association, whereas Glasgow found there was a positive association with open data – perhaps due to their repository service which provides a service to deposit data. Therefore, we at the University of Birmingham will have to work out how the majority of our researchers feel about open data and how we can change the viewpoint if open access is seen in a negative way.

The Queen Bee networking event was an excellent way to mingle with lots of the attendees and there were some interesting answers to two questions from our group: ‘Do researchers prefer onsite or cloud storage?’ The answer seems to be it depends on the data and whether it is sensitive. The other question was ‘How do you promote RDM’? One institution sent out personal letters from the VC to all researchers advocating RDM and got a good response.

Day 2 and the ‘Research Data Business Cases’ talk highlighted the large number of case studies on RDM which are readily available online, spread over a wide range of research areas and highlight the benefits of sharing data. The ‘Researcher Engagement Resources’ talk highlighted many great ideas to investigate such as ‘Data Champions’ used in Cambridge and ‘Data Conversations’ used in Lancaster. Cambridge has 41 Data Champions, some of whom have created their own GitHub course, weekly tips on RDM and FAQ’s specific to their discipline, as well as spreading the word about RDM in 26 departments spread over the University (apart from Arts & Humanities). The Data Conversations seem a great way to engage researchers by discussing specific topics such as ‘Data security and confidentiality’ – a topic which we get a lot of questions on so it would be good to have a similar session here.

Altogether it was a very well organised event that served as an excellent introduction to the field of RDM for myself. It’s a shame that the weather didn’t live up to expectations as the lake around the Ron Cooke Hub would have been lovely to sit around. The demonstrations were held in pods around the lake but there was too much of interest in the talks to tempt me away!

Slides and extensive notes are available at: https://research-data-network.readme.io/docs/4th-research-data-network-york-university

Comments welcome!