26 June 2025 | last update: 26 June 2025 | 10 min read Tags: MARL Cluster

MARL Cluster Training

MARL Cluster Training

TL;DR

Let’s dive into MARL training with BenchMARL and how to scale up your experiments to a cluster. We’ll cover environment and model customisation, as well as how to run your experiments on a cluster even without internet access.

Table of content

# Context

As a part of my PhD, I’m working on Multi-Agent Reinforcement Learning (MARL), more precisely on interpretability of MARL, as I outlined in this article. Yet to be able to interpret any agent it’s important to master MARL training. My primary goal is to spend as little time as possible training MARL agents. I also want to use classical MARL algorithms and environments to be able to contextualize my results with the state of the art, after testing a few tools I chose to go with BenchMARL.

BenchMARL

BenchMARL is a specialized library designed to ease MARL training. It provides a standardized interface that enables reproducibility and fair benchmarking across various MARL algorithms and environments.

Extra

BenchMARL is really well packaged, easily extensible and already embed configs’ defaults.

BenchMARL’s backends:

  • TorchRL: provides a standardized interface for MARL algorithms and environments
  • Hydra: provides a flexible and modular configuration system
  • marl-eval: provides a standardized evaluation system

I encourage you to check out the BenchMARL documentation for more details.

# MARL Training

Basic Setup

Along with this blog post, I have prepared a repository with a basic setup of BenchMARL. Please refer to the README for installation instructions and details.

Feedback

Feel free to open an issue if you have any question, problem or remark you want to share.

The first thing you might want to do is train supported algorithms on a supported environment. Let’s say you want to compare MAPPO and IPPO on Multiwalker. Such a benchmark is made of 2 independent experiments, that you can test individually using the script experiments/run_local.py, inspired by the run.py script provided by benchmarl.

  • IPPO & Multiwalker:
    uv run -m scripts.experiments.run_local algorithm=ippo task=pettingzoo/multiwalker
  • MAPPO & Multiwalker:
    uv run -m scripts.experiments.run_local algorithm=mappo task=pettingzoo/multiwalker

These scripts are based on Hydra’s configuration system, which allows you to easily modify the configuration of your experiment in a YAML file or through the command line. This is especially important when you want to run multiple experiments with different configurations, e.g. for hyperparameters search. Additionally, the defaults can be loaded directly from BenchMARL since the script’s config (exp:run_local.yaml) adds it to the Hydra search path.

Tweaking

You can easily tweak an algorithm by creating a new file in the algorithm config group and iteretating on it manually or with hyperparameter search. You can similarily tweak experiments and models.

But the strength of BenchMARL is to run benchmarks, i.e. a group of reproducible experiments with a similar configs. You can start by running the benchmarks/multiwalker.py script, which is equivalent to running the previous experiments (with multiple seeds) but fully baked with the powerfull plots from marl-eval to compare the experiments.

  • Run the benchmark: uv run -m scripts.benchmarks.multiwalker

As this becomes to be tedious to run it on a personal computer the next step is to run it on a cluster. Jump to the Cluster Training section for more details.

Benchmark Config

I proposed a config for the multiwalker benchmark based on object packing similar to how layers are composed. You can use it as a template to create your own benchmarks configs.

Custom Task

One of BenchMARL’s strengths is its ability to integrate custom tasks. First let’s say I want to create a supported task variation, e.g. multiwalker with a shared reward.

Composition bug

Unfortunately unlike algorithms, tasks are nested in the framework folder (e.g. pettingzoo). Because of this and a bug in Hydra, it is not possible to compose defaults from nested tasks, see this issue.

But it is still possible to derive a custom task in a few steps:

  • Create a config file: multiwalker/shared.yaml
  • Register the task (before creating the experiment object):
  • Run the experiment:
    uv run -m scripts.experiments.run_local algorithm=mappo task=multiwalker/shared
Hack

Contrary to other groups (algorithms, experiments or models), tasks need to be registered as they are not spawn directly. Indeed they are spawn through their factored environment class.

Another kind of custom tasks are unsupported tasks from a supported environment class, for example KAZ (Knights Archers Zombies) from PettingZoo. First you need to create a custom task class:

Which is used in the task:

Then you can register the task:

And additionally you can validate the task using a dataclass to serve as schema, which you would add in a ConfigStore:

For truly custom environments see the exemples. You might want to also dig in the torchrl documentation first, to understand how to create your own environements.

Question

How to handle the KAZ vector state (B, N, F)?

Answer

As N is not an agent dimension, you cannot directly use the Mlp ((B, F) inputs) model nor the Cnn ((B, F) or (B, H, W, C) inputs) model. You’ll need either to modify the environement to ouput the correct shape or use a custom model based on a Cnn, Mlp or a flattening layer.

Custom Model

As noted in the previous section, KAZ vector state requires a custom model. The easiest will be to modify the MlP model, flattening any extra dimension. The basic idea is to introduce a new num_extra_dims parameter in the model config, which will be used to flatten the input tensor.

This parameter will first be used in the _perform_checks method to check that the input tensor has the correct shape, and finally in the _forward method to simplify the input tensor:

It then needs to come with a config file extra_mlp.yaml and should be registered in the model_config_registry:

And that’s it! You can now use your custom model in your experiments.

Extra Dims

See the refined PR here.

# Cluster Training

Now that we can run the experiments we want locally let’s scale up our experiments to a cluster.

Tweaks and tricks

The more optimisations you’ll want on the training process the more you’ll need to dig in the torchrl backend for more control over the environements and objects and in the torch backend for more control over the models and tensors.

Cluster Setup

After having tested a few different setups I ended up settling on a full uv config. Here is my opinionated setup:

  • [Optional] Setup git in your project
  • [Optional] Link a GitHub repository
  • Sync your project to the cluster (I prefer using a git remote for easy bidirectional edits but rsync or scp can be simpler)
  • Use uv sync to install the dependencies on the cluster (this step can be delegated to the job should you have an internet connection, see No Internet)
  • Run your jobs or notebooks on the cluster

You should do this on a Work or Draft partition, avoid installing in your home directory which might have a limited space (in terms of disk space and inodes). Always be aware of your cluster configuration and check with your admin in case of doubts.

Pros
  • It’s super easy to use once you’re used to uv
  • The configuration is the same as the local one
  • Fully compatible with slurm jobs and jupyter notebooks/hubs
  • Works without internet on the nodes (see No Internet)
Cons
  • You might miss package optimisations tailored for your cluster
  • It can consume a lot of inodes (you might need to remove old virtual environments)
  • Not super compatible with classical cuda echosystem

Running Experiments

The easy part with the setup I presented is that the same script can be used to run your experiments locally or on a cluster. So except for the slurm arguments; which might differ depending on your cluster, you can use the same script.

A typical slurm script, that you can launch using sbatch launch/bench:multiwalker-jz.sh, would look like this:

And to make use of the GPU you can just switch the experiment config by adding the argument experiment=gpu. It will simply load the default experiment config (base_experiment) and overrides the gpu config:

Now you’re ready to launch a bunch of jobs doing wild hyperparameter search, with bigger models, bigger batch sizes, etc.

JupyterHub

You can find an example of how to run a notebook on a cluster JupyterHub in my project template, this can vary depending on your cluster (example made for JeanZay).

No Internet

Some clusters for security reasons don’t allow internet access on the nodes (e.g. JeanZay in France). This can be a problem if you want to setup you environment (e.g. with uv) directly on the nodes (which can be easier). So let’s see how we can easily transpose what we’ve seen so far to a cluster without internet connection.

As noted in the Cluster Config Setup section, you should install your dependencies before starting your jobs. If you don’t have an internet connection on the head node (never seen this though), you might try to transfer your local setup, or if allowed use a Docker image.

Then the only thing you need to do is to remove or disable the tools that require internet access. In our experiments you just need to use wandb in “offline” mode (e.g. using experiment=gpu_offline in the script):

And when running a script with uv you should use the --no-sync flag to avoid syncing your dependencies again. Depending on your use case you might need to download your datasets to a special partition beforehand.

# Further Customisation

As you can see in the previous sections, BenchMARL is a really powerful tool to train MARL agents. However, there are still some things you might want to customise to fit your experiments needs. Feel free to open an issue or a PR if you want to add or suggest a customisation.

# Resources

To learn more about BenchMARL and MARL in general, here are some valuable resources:

BenchMARL’s Discord community is also a great place to ask questions and share experiences with other users.