Skip to content

Multi-agent reinforcement learning framework for Unity environments. Implements MAPPO, MASAC, MATD3, and MADDPG with comprehensive evaluation tools. Features sample-efficient training, competitive analysis, and pre-trained models achieving great performance in Tennis and Soccer environments.

License

Notifications You must be signed in to change notification settings

legalaspro/unity_multiagent_rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Unity Multi-Agent Reinforcement Learning

Python Version CI Tests coverage status

A multi-agent reinforcement learning framework for Unity environments. Provides implementations for training and evaluating MARL algorithms on collaborative and competitive tasks.

🎯 Project Overview

This repository implements four multi-agent reinforcement learning algorithms for Unity environments. The framework has been validated on two environments and can be extended to support additional Unity ML-Agents environments. Train, evaluate, and compare MARL algorithms on collaborative and competitive tasks.

MASAC Tennis MASAC Soccer
  • Tennis - Collaborative 2-agent environment where agents control rackets to keep a ball in play. Success requires achieving +0.5 average score over 100 episodes.

  • Soccer - Competitive 4-agent environment with 2v2 teams (goalie and striker roles with different action sizes). Agents learn to score goals while defending. Success measured by win rate against previous model versions.

πŸ€– Algorithms

All algorithms implement CTDE (Centralized Training, Decentralized Execution) architecture, enabling agents to learn with global information during training while acting independently during execution. Agents engage in self-play, continuously improving by playing against previous versions of themselves.

Key Implementation Features:

  • Self-play training: Agents improve by competing against their own evolving strategies

  • Shared critic optimization: For shared critic variants:

    • Global observation ordering: Current agent β†’ teammates β†’ opponents for consistent input
    • Action concatenation: All agent actions appended in same order for centralized critic
    • Consistent input structure: Improves and speeds up shared critic and agent training
  • MAPPO (Multi-Agent Proximal Policy Optimization)

    • All Shared: Shared policy and critic networks across all agents
    • Critic Shared: Individual policies with shared centralized critic
    • Independent: Individual policies and critics per agent
  • MATD3 (Multi-Agent Twin Delayed Deep Deterministic Policy Gradient)

  • MASAC (Multi-Agent Soft Actor-Critic)

    • Independent: Individual actors and critics per agent
    • Shared Critic: Individual actors with shared centralized critic
  • MADDPG (Multi-Agent Deep Deterministic Policy Gradient)

πŸ“Š Results Summary

Environment Algorithm Average Score (100-ep) Training Steps Agents Notes
Tennis MATD3 2.483 ~199k steps 2 Best performer
Tennis MASAC 2.450 ~199k steps 2 Fastest to succeed
Tennis MAPPO (All Shared) 1.490 ~501k steps 2 Sample inefficient
Tennis MADDPG 0.796 ~199k steps 2 Successful
Tennis MAPPO (Critic Shared) 0.765 ~501k steps 2 Slowest to succeed
Soccer MAPPO (Shared Critic) 97.2% vs random ~1M steps 4 Excellent
Soccer MASAC (Shared Critic) 84.4% vs random ~200k steps 4 Very good

πŸš€ Quick Start

Training

# Train MASAC on Tennis environment
python train.py --env_id Tennis --algo masac --max_steps 200000

# Train MAPPO on Soccer environment with custom config
python train.py --env_id Soccer --algo mappo --config configs/env_tuned/mappo_soccer.yaml

# Render trained models
python render.py --config results/Tennis/masac/config.yaml --model_path results/Tennis/masac/final-torch.model --worker_id 5 --render_episodes 5

Visualization & Analysis

# Generate algorithm comparison plots
python render_results.py

# Create competitive evaluation plots (for Soccer)
python render_competitive_results.py

πŸ“ Project Structure

β”œβ”€β”€ algos/                   # Algorithm implementations (MAPPO, MATD3, MASAC, MADDPG)
β”œβ”€β”€ networks/               # Neural network architectures (actors, critics, modules)
β”œβ”€β”€ envs/                   # Environment wrappers for Unity ML-Agents
β”œβ”€β”€ buffers/                # Experience replay and trajectory storage
β”œβ”€β”€ runners/                # Training loop implementations
β”œβ”€β”€ evals/                  # Evaluation metrics and competitive analysis
β”œβ”€β”€ configs/                # Configuration files and hyperparameters
β”œβ”€β”€ app/                    # Unity environment executables (download separately)
β”œβ”€β”€ results/                # Training outputs and saved models
β”œβ”€β”€ figures/                # Generated plots and visualizations
β”œβ”€β”€ utils/                  # Utility functions and helpers
β”œβ”€β”€ python/                 # Unity ML-Agents Python API
β”œβ”€β”€ train.py               # Main training script
β”œβ”€β”€ render_results.py      # Training results visualization
β”œβ”€β”€ render_competitive_results.py  # Competitive evaluation plots
└── render.py              # Model rendering script

πŸ› οΈ Installation

Prerequisites

  • Python 3.11+
  • Git
  • Unity environments (download separately - see below)

Option 1: Using Conda (Recommended)

# Clone the repository
git clone https://github.com/legalaspro/unity_multiagent_rl.git
cd unity_multiagent_rl

# Create and activate environment
conda env create -f environment.yaml
conda activate unity_multiagent_rl

Option 2: Using Pip

# Clone the repository
git clone https://github.com/legalaspro/unity_multiagent_rl.git
cd unity_multiagent_rl

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install -e ./python

Download Unity Environments

Note: These Unity environments use an older ML-Agents version (0.4.0 or earlier) loaded from the python/ folder. The algorithms can be adapted to work with newer Unity ML-Agents versions.

The Unity environment executables are not included in this repository and must be downloaded separately:

Tennis Environment

Download the environment that matches your operating system:

Soccer Environment

Download the environment that matches your operating system:

Installation Steps

  1. Download the appropriate environment file(s) for your operating system
  2. Extract the downloaded file(s) to the app/ directory in the project root
  3. Ensure the extracted files have the correct names:
    • Tennis: app/Tennis.app (macOS) or app/Tennis.exe (Windows) or app/Tennis (Linux)
    • Soccer: app/Soccer.app (macOS) or app/Soccer.exe (Windows) or app/Soccer (Linux)

Verify Installation

# Test the installation
python train.py --help

# Test with a short training run
python train.py --env_id Tennis --algo masac --max_steps 1000

πŸ“Š Training & Evaluation

Training

# Basic training
python train.py --env_id Tennis --algo masac --max_steps 200000

# Use pre-configured settings
python train.py --config configs/env_tuned/mappo_tennis.yaml

Rendering

# Render trained models
python render.py --config results/Tennis/masac/config.yaml --model_path results/Tennis/masac/final-torch.model --worker_id 5 --render_episodes 5

πŸ“ˆ Results and Visualization

All training results are stored in results/Tennis/ and results/Soccer/ including trained models, training data, videos, GIFs, and performance graphs.

Tennis Algorithm Comparison

Tennis Rewards Comparison

Shows training progress for all algorithms on Tennis environment. MATD3 and MASAC achieve the highest scores (~2.5), while MAPPO variants show steady improvement over longer training periods.

Soccer Competitive Evaluation

Soccer Win Rate vs Random

Displays win rate against random opponents over training. MAPPO (shared critic) reaches 97%+ win rate, demonstrating superior performance in competitive multi-agent scenarios.

πŸ”§ Configuration

Algorithm Hyperparameters

Edit files in configs/algos/ or configs/env_tuned/ to customize:

  • Learning rates
  • Network architectures
  • Training parameters
  • Evaluation settings

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Unity ML-Agents Team for the environments and Python API
  • OpenAI for algorithm implementations and research
  • PyTorch team for the deep learning framework

About

Multi-agent reinforcement learning framework for Unity environments. Implements MAPPO, MASAC, MATD3, and MADDPG with comprehensive evaluation tools. Features sample-efficient training, competitive analysis, and pre-trained models achieving great performance in Tennis and Soccer environments.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •