Data Tree – comprehensive data management training for free!

Published: Posted on

Back in July 2018, I attended an introductory event to the Data Tree data management course developed by NERC, which provides free online training for PhD students and early career researchers in the environmental sciences. I was previously a researcher in life sciences but now advise all disciplines in data management, so I was interested to see if there is anything new for me to learn as well as if it is relevant to other disciplines. The course is available from this link:

For almost two years, completing the course has been on my to-do list but I’m glad I waited as there are now a series of very useful, complementary video interviews with researchers. As well as discussing data management throughout your research, it also discusses data analysis and then focuses on preserving your data and engaging with the beneficiaries of your research such as business and the general public. The course is designed to be modular, composing of eight modules which are split into shorter topics:

You only complete the areas that are relevant and there is no timetable, hence no facilitation by course teachers unlike some free courses such as those run by Future Learn. It is designed to take around 15-20 hours to complete and involves a series of interactive presentations, videos and quizzes to test understanding. You can download a pdf certificate for the completion of each module. I found it quite easy to dip in and out as it records which sections of the module topics you have completed and if you have to leave an interactive presentation half-way through, it remembers where you got up to.

So far, I have only completed the first introductory module on Data Management: Context. This took me around 3-4 hours to complete but you could spend longer if reading all the associated resources and especially if you’re new to the topics.

The topics covered in the first, introductory module are:

  • Key Concepts
  • The Research Data Lifecycle
  • Sharing Data & The Research Community
  • Reproducibility of research data and outputs
  • The Policy Environment
  • Data Ethics
  • Coding for Data Management

The key concepts topic takes you through what is research data and the different types. In the life sciences it is important to remember the physical data eg. soil samples & specimens, in addition to quantitative and qualitative data. Later topics emphasise the importance of research data being for the public good – if you are a researcher working in a publicly funded area then there is a duty to make the data accessible to others. It also stresses the importance of planning to make your data open right from the start of a project and providing sufficient accompanying metadata to provide context.

I found the topic ‘Sharing Data & The Research Community’ particularly comprehensive covering open and FAIR (Findable, Accessible, Interoperable and Reproducible) data including using a FAIR data checklist, types of repositories and licenses for open data. I also like the way that the importance of software curation and sharing of software is recognised throughout. There is an optional topic on ‘coding for data management’ with some excellent short videos around the subject.

Overall, I would highly recommend all researchers in the sciences complete the first module as it really does give a comprehensive review of data management. I hope to finish the other modules as soon as time permits. The current period of lockdown is a great time to start looking more into data management, particularly as you may now have realised how important it is when adrift from your usual working environment!