The value of version control

Published: Posted on

The Research Software Group’s mission is rooted in the observation that most current research requires software, often written by researchers themselves, and that the application of a few core software engineering principles leads to more effective research. This is neatly captured by the Software Sustainability Institute‘s tag line: “better software, better research”.

A core technology that underpins most of the good practices we apply is version control, which is in essence a way of tracking changes to source code. But that only hints at its versatility and power. Version control makes it much easier to identify when bugs were introduced and to specify the state of the code at particular moments (e.g. release versions). It’s critical to collaborative development (e.g. on open source software) and is equally effective at handling other formats that can be represented in plain text (e.g. we collaborate on our RSG annual reports via version control) as with software source code. When we start collaborating with researchers on software, usually the first thing we’ll do is start tracking the code under version control.

This is not how we collaborate on our annual reports. (Comic by PHD Comics.)

Unlike programming languages themselves, however, many researchers only encounter version control later in their careers, often when they start collaborating on software. This year marks 20 years since the first release of the foremost version control system, git, and I thought it a good moment to discuss how valuable it is. From here on I’ll talk exclusively about git but the principles apply to any version control system, of which there have been many over the years.

As mentioned, git in essence tracks changes. Each time we want to record some changes to a code base (or really whatever files we’ve told git to track), we add them to the list of changes we want to record, then tell git to record that set of changes in a “commit”, to which we add a message describing the changes. The set of files we’re tracking is the “repository”. If we make changes that we don’t want, we can easily discard them if we haven’t committed them, and roll back to a previous commit if we have. So already we have a local backup system with a rich set of tools to navigate the history of our files when necessary.

Git’s real power is in having multiple commit histories with a common starting point, called “branches”. If you want to experiment with a new feature without risking the default version of the code, just create a branch. If it doesn’t work, you can drop it later. If it does, you can “merge” it back into the default version, where Git uses various algorithms that mean this is often seamless even when other changes have been made. If you’ve ever noticed open source projects encouraging contributions via “pull requests” or “merge requests”, branching is the basis of those processes.

Git has a reputation of being difficult to learn but my personal recommendation is to simply start right now with something small. I personally spent my first few months with git only pulling, committing and pushing a single repository of my own after nearly eight years as a researcher. I immediately regretted that I hadn’t started earlier and don’t want you to repeat my mistake!

This is pretty much how I started, too. (Comic by xkcd.)

Our BEAR Training includes regular sessions on Git using material from the Software Carpentries’ Version Control with Git course. If you don’t already use version control, join us at the next git session
on 10 December
!

Author: Warrick Ball is a Senior Research Software Engineer and Manager at the University of Birmingham. He previously worked in the School of Physics and Astronomy and now leads one the groups within Research Software Group within the Advanced Research Computing (ARC) team. His background is in physics, specifically asteroseismology, and his current role involves supporting research by developing and improving research software. https://www.birmingham.ac.uk/research/arc/rsg/staff/warrick-ball