A multi-agent reinforcement learning framework for Unity environments. Provides implementations for training and evaluating MARL algorithms on collaborative and competitive tasks.
This repository implements four multi-agent reinforcement learning algorithms for Unity environments. The framework has been validated on two environments and can be extended to support additional Unity ML-Agents environments. Train, evaluate, and compare MARL algorithms on collaborative and competitive tasks.
-
Tennis - Collaborative 2-agent environment where agents control rackets to keep a ball in play. Success requires achieving +0.5 average score over 100 episodes.
-
Soccer - Competitive 4-agent environment with 2v2 teams (goalie and striker roles with different action sizes). Agents learn to score goals while defending. Success measured by win rate against previous model versions.
All algorithms implement CTDE (Centralized Training, Decentralized Execution) architecture, enabling agents to learn with global information during training while acting independently during execution. Agents engage in self-play, continuously improving by playing against previous versions of themselves.
Key Implementation Features:
-
Self-play training: Agents improve by competing against their own evolving strategies
-
Shared critic optimization: For shared critic variants:
- Global observation ordering: Current agent β teammates β opponents for consistent input
- Action concatenation: All agent actions appended in same order for centralized critic
- Consistent input structure: Improves and speeds up shared critic and agent training
-
MAPPO (Multi-Agent Proximal Policy Optimization)
- All Shared: Shared policy and critic networks across all agents
- Critic Shared: Individual policies with shared centralized critic
- Independent: Individual policies and critics per agent
-
MATD3 (Multi-Agent Twin Delayed Deep Deterministic Policy Gradient)
-
MASAC (Multi-Agent Soft Actor-Critic)
- Independent: Individual actors and critics per agent
- Shared Critic: Individual actors with shared centralized critic
-
MADDPG (Multi-Agent Deep Deterministic Policy Gradient)
Environment | Algorithm | Average Score (100-ep) | Training Steps | Agents | Notes |
---|---|---|---|---|---|
Tennis | MATD3 | 2.483 | ~199k steps | 2 | Best performer |
Tennis | MASAC | 2.450 | ~199k steps | 2 | Fastest to succeed |
Tennis | MAPPO (All Shared) | 1.490 | ~501k steps | 2 | Sample inefficient |
Tennis | MADDPG | 0.796 | ~199k steps | 2 | Successful |
Tennis | MAPPO (Critic Shared) | 0.765 | ~501k steps | 2 | Slowest to succeed |
Soccer | MAPPO (Shared Critic) | 97.2% vs random | ~1M steps | 4 | Excellent |
Soccer | MASAC (Shared Critic) | 84.4% vs random | ~200k steps | 4 | Very good |
# Train MASAC on Tennis environment
python train.py --env_id Tennis --algo masac --max_steps 200000
# Train MAPPO on Soccer environment with custom config
python train.py --env_id Soccer --algo mappo --config configs/env_tuned/mappo_soccer.yaml
# Render trained models
python render.py --config results/Tennis/masac/config.yaml --model_path results/Tennis/masac/final-torch.model --worker_id 5 --render_episodes 5
# Generate algorithm comparison plots
python render_results.py
# Create competitive evaluation plots (for Soccer)
python render_competitive_results.py
βββ algos/ # Algorithm implementations (MAPPO, MATD3, MASAC, MADDPG)
βββ networks/ # Neural network architectures (actors, critics, modules)
βββ envs/ # Environment wrappers for Unity ML-Agents
βββ buffers/ # Experience replay and trajectory storage
βββ runners/ # Training loop implementations
βββ evals/ # Evaluation metrics and competitive analysis
βββ configs/ # Configuration files and hyperparameters
βββ app/ # Unity environment executables (download separately)
βββ results/ # Training outputs and saved models
βββ figures/ # Generated plots and visualizations
βββ utils/ # Utility functions and helpers
βββ python/ # Unity ML-Agents Python API
βββ train.py # Main training script
βββ render_results.py # Training results visualization
βββ render_competitive_results.py # Competitive evaluation plots
βββ render.py # Model rendering script
- Python 3.11+
- Git
- Unity environments (download separately - see below)
# Clone the repository
git clone https://github.com/legalaspro/unity_multiagent_rl.git
cd unity_multiagent_rl
# Create and activate environment
conda env create -f environment.yaml
conda activate unity_multiagent_rl
# Clone the repository
git clone https://github.com/legalaspro/unity_multiagent_rl.git
cd unity_multiagent_rl
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -e ./python
Note: These Unity environments use an older ML-Agents version (0.4.0 or earlier) loaded from the python/
folder. The algorithms can be adapted to work with newer Unity ML-Agents versions.
The Unity environment executables are not included in this repository and must be downloaded separately:
Download the environment that matches your operating system:
- Linux: Tennis_Linux.zip
- Mac OSX: Tennis.app.zip
- Windows (32-bit): Tennis_Windows_x86.zip
- Windows (64-bit): Tennis_Windows_x86_64.zip
- AWS/Headless: Tennis_Linux_NoVis.zip
Download the environment that matches your operating system:
- Linux: Soccer_Linux.zip
- Mac OSX: Soccer.app.zip
- Windows (32-bit): Soccer_Windows_x86.zip
- Windows (64-bit): Soccer_Windows_x86_64.zip
- AWS/Headless: Soccer_Linux_NoVis.zip
- Download the appropriate environment file(s) for your operating system
- Extract the downloaded file(s) to the
app/
directory in the project root - Ensure the extracted files have the correct names:
- Tennis:
app/Tennis.app
(macOS) orapp/Tennis.exe
(Windows) orapp/Tennis
(Linux) - Soccer:
app/Soccer.app
(macOS) orapp/Soccer.exe
(Windows) orapp/Soccer
(Linux)
- Tennis:
# Test the installation
python train.py --help
# Test with a short training run
python train.py --env_id Tennis --algo masac --max_steps 1000
# Basic training
python train.py --env_id Tennis --algo masac --max_steps 200000
# Use pre-configured settings
python train.py --config configs/env_tuned/mappo_tennis.yaml
# Render trained models
python render.py --config results/Tennis/masac/config.yaml --model_path results/Tennis/masac/final-torch.model --worker_id 5 --render_episodes 5
All training results are stored in results/Tennis/
and results/Soccer/
including trained models, training data, videos, GIFs, and performance graphs.
Shows training progress for all algorithms on Tennis environment. MATD3 and MASAC achieve the highest scores (~2.5), while MAPPO variants show steady improvement over longer training periods.
Displays win rate against random opponents over training. MAPPO (shared critic) reaches 97%+ win rate, demonstrating superior performance in competitive multi-agent scenarios.
Edit files in configs/algos/
or configs/env_tuned/
to customize:
- Learning rates
- Network architectures
- Training parameters
- Evaluation settings
This project is licensed under the MIT License - see the LICENSE file for details.
- Unity ML-Agents Team for the environments and Python API
- OpenAI for algorithm implementations and research
- PyTorch team for the deep learning framework