Understanding multi-agent learning through large-scale simulations – Birmingham Environment for Academic Research

In this case study we hear from Tuo Zhang, a researcher working in Machine Learning, who has been using BlueBEAR to run large-scale simulations of multi-agent reinforcement learning (MARL) algorithms. His research focuses on understanding how learning agents behave when interacting with each other in complex and changing environments.

In many real-world systems, decision-making is not performed by a single agent, but rather emerges from the interaction of multiple agents adapting simultaneously. These interactions can lead to a wide range of behaviours — from stable convergence to persistent oscillations or even divergence. Understanding these dynamics is particularly challenging when the environment is non-stationary, meaning that the underlying system itself changes over time.

To study this, Tuo runs large numbers of simulations where agents repeatedly interact under different conditions. These include varying the structure of the game, introducing noise into the learning process, and gradually modifying the environment over time. Because theoretical analysis alone is often insufficient, simulation becomes the primary tool for understanding how these systems behave.

Each experiment involves simulating long learning trajectories, often across hundreds of independent runs. This is necessary to ensure that the observed behaviour is robust and not due to randomness. However, this also makes the computational cost extremely high. Running these experiments on a local machine would be prohibitively slow.

Using BlueBEAR, Tuo is able to run many simulations in parallel, significantly reducing the time required to complete each study. This allows him to explore a much wider range of parameter settings and experimental conditions than would otherwise be possible.

“Running these experiments locally would take an impractical amount of time. BlueBEAR allows me to scale up my experiments and get results in a reasonable timeframe.”

*Kuhn’s learning curve under different noise conditions*

In addition to compute, access to the Research Data Store (RDS) has been an important part of the workflow. As someone working primarily on a laptop, managing large volumes of simulation data can be difficult. Expanding local storage is expensive, and external hard drives are often inconvenient for day-to-day use.

With support from the BEAR team, Tuo was able to set up RDS Windows mapping, allowing the storage to be accessed directly as a network drive. This makes it straightforward to store and organise large datasets without worrying about local storage limits.

“RDS has made a big difference to how I manage my data. I can store everything centrally and access it easily from my laptop, which is much more convenient than relying on external drives.”

By combining large-scale compute with secure and flexible data storage, BlueBEAR and RDS together provide a powerful infrastructure for supporting data-intensive research in machine learning.

We were so pleased to hear of how Tuo was able to make use of what is on offer from Advanced Research Computing. If you have any examples of how it has helped your research then do get in contact with us at bearinfo@contacts.bham.ac.uk.

We are always looking for good examples of use of High Performance Computing to nominate for HPC Wire Awards – see our recent winner for more details.