MARL Cluster Training
TL;DRLet’s dive into MARL training with BenchMARL and how to scale up your experiments to a cluster. We’ll cover environment and model customisation, as well as how to run your experiments on a cluster even without internet access.
# Context
As a part of my PhD, I’m working on Multi-Agent Reinforcement Learning (MARL), more precisely on interpretability of MARL, as I outlined in this article. Yet to be able to interpret any agent it’s important to master MARL training. My primary goal is to spend as little time as possible training MARL agents. I also want to use classical MARL algorithms and environments to be able to contextualize my results with the state of the art, after testing a few tools I chose to go with BenchMARL.
BenchMARL
BenchMARL is a specialized library designed to ease MARL training. It provides a standardized interface that enables reproducibility and fair benchmarking across various MARL algorithms and environments.
ExtraBenchMARL is really well packaged, easily extensible and already embed configs’ defaults.
BenchMARL’s backends:
- TorchRL: provides a standardized interface for MARL algorithms and environments
- Hydra: provides a flexible and modular configuration system
- marl-eval: provides a standardized evaluation system
I encourage you to check out the BenchMARL documentation for more details.
# MARL Training
Basic Setup
Along with this blog post, I have prepared a repository with a basic setup of BenchMARL. Please refer to the README for installation instructions and details.
FeedbackFeel free to open an issue if you have any question, problem or remark you want to share.
The first thing you might want to do is train supported algorithms on a supported environment. Let’s say you want to compare MAPPO and IPPO on Multiwalker. Such a benchmark is made of 2 independent experiments, that you can test individually using the script experiments/run_local.py
, inspired by the run.py
script provided by benchmarl.
- IPPO & Multiwalker:
uv run -m scripts.experiments.run_local algorithm=ippo task=pettingzoo/multiwalker
- MAPPO & Multiwalker:
uv run -m scripts.experiments.run_local algorithm=mappo task=pettingzoo/multiwalker
These scripts are based on Hydra’s configuration system, which allows you to easily modify the configuration of your experiment in a YAML file or through the command line. This is especially important when you want to run multiple experiments with different configurations, e.g. for hyperparameters search. Additionally, the defaults can be loaded directly from BenchMARL since the script’s config (exp:run_local.yaml
) adds it to the Hydra search path.
TweakingYou can easily tweak an algorithm by creating a new file in the
algorithm
config group and iteretating on it manually or with hyperparameter search. You can similarily tweak experiments and models.
But the strength of BenchMARL is to run benchmarks, i.e. a group of reproducible experiments with a similar configs. You can start by running the benchmarks/multiwalker.py
script, which is equivalent to running the previous experiments (with multiple seeds) but fully baked with the powerfull plots from marl-eval
to compare the experiments.
- Run the benchmark:
uv run -m scripts.benchmarks.multiwalker
As this becomes to be tedious to run it on a personal computer the next step is to run it on a cluster. Jump to the Cluster Training section for more details.
Benchmark ConfigI proposed a config for the multiwalker benchmark based on object packing similar to how layers are composed. You can use it as a template to create your own benchmarks configs.
Custom Task
One of BenchMARL’s strengths is its ability to integrate custom tasks. First let’s say I want to create a supported task variation, e.g. multiwalker with a shared reward.
Composition bugUnfortunately unlike algorithms, tasks are nested in the framework folder (e.g.
pettingzoo
). Because of this and a bug in Hydra, it is not possible to compose defaults from nested tasks, see this issue.
But it is still possible to derive a custom task in a few steps:
- Create a config file:
multiwalker/shared.yaml
- Register the task (before creating the experiment object):
- Run the experiment:
uv run -m scripts.experiments.run_local algorithm=mappo task=multiwalker/shared
HackContrary to other groups (algorithms, experiments or models), tasks need to be registered as they are not spawn directly. Indeed they are spawn through their factored environment class.
Another kind of custom tasks are unsupported tasks from a supported environment class, for example KAZ (Knights Archers Zombies) from PettingZoo. First you need to create a custom task class:
Which is used in the task:
Then you can register the task:
And additionally you can validate the task using a dataclass
to serve as schema, which you would add in a ConfigStore
:
For truly custom environments see the exemples. You might want to also dig in the torchrl documentation first, to understand how to create your own environements.
QuestionHow to handle the KAZ vector state
(B, N, F)
?
AnswerAs
N
is not an agent dimension, you cannot directly use theMlp
((B, F)
inputs) model nor theCnn
((B, F)
or(B, H, W, C)
inputs) model. You’ll need either to modify the environement to ouput the correct shape or use a custom model based on aCnn
,Mlp
or a flattening layer.
Custom Model
As noted in the previous section, KAZ vector state requires a custom model. The easiest will be to modify the MlP
model, flattening any extra dimension. The basic idea is to introduce a new num_extra_dims
parameter in the model config, which will be used to flatten the input tensor.
This parameter will first be used in the _perform_checks
method to check that the input tensor has the correct shape, and finally in the _forward
method to simplify the input tensor:
It then needs to come with a config file extra_mlp.yaml
and should be registered in the model_config_registry
:
And that’s it! You can now use your custom model in your experiments.
Extra DimsSee the refined PR here.
# Cluster Training
Now that we can run the experiments we want locally let’s scale up our experiments to a cluster.
Tweaks and tricksThe more optimisations you’ll want on the training process the more you’ll need to dig in the
torchrl
backend for more control over the environements and objects and in thetorch
backend for more control over the models and tensors.
Cluster Setup
After having tested a few different setups I ended up settling on a full uv
config. Here is my opinionated setup:
- [Optional] Setup
git
in your project - [Optional] Link a GitHub repository
- Sync your project to the cluster (I prefer using a
git
remote for easy bidirectional edits butrsync
orscp
can be simpler) - Use
uv sync
to install the dependencies on the cluster (this step can be delegated to the job should you have an internet connection, see No Internet) - Run your jobs or notebooks on the cluster
You should do this on a Work or Draft partition, avoid installing in your home directory which might have a limited space (in terms of disk space and inodes). Always be aware of your cluster configuration and check with your admin in case of doubts.
Pros
- It’s super easy to use once you’re used to
uv
- The configuration is the same as the local one
- Fully compatible with
slurm
jobs andjupyter
notebooks/hubs- Works without internet on the nodes (see No Internet)
Cons
- You might miss package optimisations tailored for your cluster
- It can consume a lot of inodes (you might need to remove old virtual environments)
- Not super compatible with classical
cuda
echosystem
Running Experiments
The easy part with the setup I presented is that the same script can be used to run your experiments locally or on a cluster. So except for the slurm
arguments; which might differ depending on your cluster, you can use the same script.
A typical slurm
script, that you can launch using sbatch launch/bench:multiwalker-jz.sh
, would look like this:
And to make use of the GPU you can just switch the experiment config by adding the argument experiment=gpu
. It will simply load the default experiment config (base_experiment
) and overrides the gpu
config:
Now you’re ready to launch a bunch of jobs doing wild hyperparameter search, with bigger models, bigger batch sizes, etc.
JupyterHubYou can find an example of how to run a notebook on a cluster JupyterHub in my project template, this can vary depending on your cluster (example made for JeanZay).
No Internet
Some clusters for security reasons don’t allow internet access on the nodes (e.g. JeanZay in France). This can be a problem if you want to setup you environment (e.g. with uv
) directly on the nodes (which can be easier). So let’s see how we can easily transpose what we’ve seen so far to a cluster without internet connection.
As noted in the Cluster Config Setup section, you should install your dependencies before starting your jobs. If you don’t have an internet connection on the head node (never seen this though), you might try to transfer your local setup, or if allowed use a Docker image.
Then the only thing you need to do is to remove or disable the tools that require internet access. In our experiments you just need to use wandb
in “offline” mode (e.g. using experiment=gpu_offline
in the script):
And when running a script with uv
you should use the --no-sync
flag to avoid syncing your dependencies again. Depending on your use case you might need to download your datasets to a special partition beforehand.
# Further Customisation
As you can see in the previous sections, BenchMARL is a really powerful tool to train MARL agents. However, there are still some things you might want to customise to fit your experiments needs. Feel free to open an issue or a PR if you want to add or suggest a customisation.
# Resources
To learn more about BenchMARL and MARL in general, here are some valuable resources:
- BenchMARL Documentation
- BenchMARL GitHub Repository
- TorchRL Documentation
- Hydra Configuration Framework
BenchMARL’s Discord community is also a great place to ask questions and share experiences with other users.