Baskerville is a GPU-focussed Tier 2 HPC cluster that attracts users from a wide range of disciplines and with varying levels of HPC experience. Since joining the ARC team one of my roles has been helping design a short training course called Baskerville Basics which provides useful information regardless of users’ research disciplines and HPC experience. This is due to an understanding that all HPCs are different and as such the information provided in Baskerville Basics is tailored so that users can quickly discover Baskerville’s capabilities but also learn how to use its resources as efficiently as possible.
Baskerville Basics can be found on the Baskerville Docs website (https://docs.baskerville.ac.uk/), which is the primary resource for technical information on using Baskerville. I thought a way for Baskerville Basics to differentiate itself from the rest of the docs website is by both posing questions and setting the users tasks to accomplish. Each task is accompanied by an explanation that details the importance and benefit to the user of using a particular method, along with references to further details from other parts of the docs website as well as external resources. With this method users are learning-by-doing and by the end they will have created their own example scripts that they can return to as a point of reference whenever they are having trouble.
Baskerville Basics covers information on several areas of Baskerville use: guiding users to the correct documentation on accessing Baskerville; module loading in Baskerville and the differentiation between apps in the test and live environments; job submission using the SLURM scheduler with info on the key batch-script header lines that can be used and what they do; methods for both monitoring and cancelling jobs, with several questions to help users understand which commands are most appropriate in a given situation; details and questions on QOSes to ensure that users can find information on which QOS they can access. I found this very helpful in reinforcing my own personal HPC knowledge and to think how can I explain this to the user and why should they do it in a particular way.
Since Baskerville is a GPU-focussed HPC system users are ultimately given the task of building a working submission script which took a lot of trial an error to have a script that could cover GPU use where the user can see quickly the effect of varying the amount of GPU resources. I found giving the the task of using some NVIDIA example code (CudaOpenMP) and to run it while changing the amount of GPU resources they allocate would enable them to see this effect. This is a great way to teach users about responsible and efficient resource allocation. Other information on the various methods we recommend for data-transfer, running interactive jobs and using Baskerville Portal are also provided.
With Baskerville Basics we have created a resource to help users get accustomed to using Baskerville, regardless of their HPC experience and research discipline and as the best way to ensure your own knowledge is to teach others it has helped in my own HPC usage. We hope it will be a valuable resource to researchers and as more people use the Baskerville system and provide feedback the course will grow and cover other areas to better suit the growing list of users’ needs.