Skip to content

[CVPR 2024] Domain generalization by interpolating original feature styles with styles obtained using random descriptions in natural language

Notifications You must be signed in to change notification settings

astra-vision/FAMix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes1, Tuan-Hung Vu1,2, Andrei Bursuc1,2, Patrick Pérez3, Raoul de Charette1
1 Inria, 2 valeo.ai, 3 Kyutai

Project page: https://astra-vision.github.io/FAMix/
Paper: https://arxiv.org/abs/2311.17922

TL; DR: FAMix (for Freeze, Augment, and Mix) is a simple method for domain generalized semantic segmentation, based on minimal fine-tuning, language-driven patch-wise style augmentation, and patch-wise style mixing of original and augmented styles.

Citation

@InProceedings{fahes2024simple,
  title={A Simple Recipe for Language-guided Domain Generalized Segmentation},
  author={Fahes, Mohammad and Vu, Tuan-Hung and Bursuc, Andrei and P{\'e}rez, Patrick and de Charette, Raoul},
  booktitle={CVPR},
  year={2024}
}

Demo

Test on unseen youtube videos in different cities
Training dataset: GTA5
Backbone: ResNet-50
Segmenter: DeepLabv3+

Watch the full video on YouTube

⚠️⚠️Note1: For testing datasets with higher resolution than the one used for training, scaling down the images by a factor of 2 (i.e., scale=0.5) and then upsampling the predictions back to the original resolution speeds up inference and can improve results. Thanks to tpy001 for raising this point in the issues. The scale parameter can be customized when running Evaluation by adding --scale <value>.

⚠️⚠️Note2: One more trick to improve the performance at inference: 1- Predict with a scale=1 (i.e., original size of the input image), 2- Predict with downsampled image (scale=0.5), 3- ensemble the predictions. The code for this is added, it can be activated by adding in --scale <value> and --ensemble in Evaluation.

Results with RN50 backbone and DLv3+ decoder trained on GTA5:

Backbone Decoder Scale Cityscapes Mapillary ACDC night ACDC snow ACDC rain ACDC fog
RN50 DLv3+ 1 48.51 52.39 15.02 37.38 39.56 40.99
RN50 DLv3+ 0.5 48.02 54.00 21.58 38.27 39.53 44.94
RN50 DLv3+ ensemble (1 & 0.5) 50.80 56.04 20.05 40.40 42.10 44.93

Results with RN101 backbone and DLv3+ decoder trained on GTA5:

Backbone Decoder Scale Cityscapes Mapillary ACDC night ACDC snow ACDC rain ACDC fog
RN101 DLv3+ 1 49.13 53.41 21.28 41.49 42.19 44.30
RN101 DLv3+ 0.5 50.06 55.31 23.97 40.34 42.41 44.98
RN101 DLv3+ ensemble (1 & 0.5) 51.46 56.95 24.53 43.33 44.77 47.39

Table of Content

Installation

Dependencies

First create a new conda environment with the required packages:

conda env create --file environment.yml

Then activate environment using:

conda activate famix_env

Datasets

  • ACDC: Download ACDC images and labels from ACDC. Please follow the dataset directory structure:

    <ACDC_DIR>/                   % ACDC dataset root
    ├── rbg_anon/                 % input image (rgb_anon_trainvaltest.zip)
    └── gt/                       % semantic segmentation labels (gt_trainval.zip)
  • BDD100K: Download BDD100K images and labels from BDD100K. Please follow the dataset directory structure:

    <BDD100K_DIR>/              % BDD100K dataset root
    ├── images/                 % input image
    └── labels/                 % semantic segmentation labels
  • Cityscapes: Follow the instructions in Cityscapes to download the images and semantic segmentation labels. Please follow the dataset directory structure:

    <CITYSCAPES_DIR>/             % Cityscapes dataset root
    ├── leftImg8bit/              % input image (leftImg8bit_trainvaltest.zip)
    └── gtFine/                   % semantic segmentation labels (gtFine_trainvaltest.zip)
  • GTA5: Download GTA5 images and labels from GTA5. Please follow the dataset directory structure:

    <GTA5_DIR>/                   % GTA5 dataset root
    ├── images/                   % input image 
    └── labels/                   % semantic segmentation labels
  • Mapillary: Download Mapillary images and labels from Mapillary. Please follow the dataset directory structure:

    <MAPILLARY_DIR>/              % Mapillary dataset root
    ├── training                  % Training subset 
     └── images                     % input image
     └── labels                     % semantic segmentation labels
    ├── validation                % Validation subset
     └── images                     % input image
     └── labels                     % semantic segmentation labels
  • Synthia: Download Synthia images and labels from SYNTHIA-RAND-CITYSCAPES and split it following SPLIT-DATA. Please follow the dataset directory structure:

    <SYNTHIA>/                 % Synthia dataset root
    ├── RGB/                   % input image 
    └── GT/                    % semantic segmentation labels

Trained models

The trained models are available here.

Running FAMix

Style mining

python3 patch_PIN.py \
  --dataset <dataset_name> \
  --data_root <dataset_root> \
  --resize_feat \
  --save_dir <path_for_learnt_parameters_saving>

Training

python3 main.py \
--dataset <dataset_name> \
--data_root <dataset_root> \
--total_itrs  40000 \
--batch_size 8 \
--val_interval 750 \
--transfer \
--data_aug \
--ckpts_path <path_to_save_checkpoints> \
--path_for_stats <path_for_mined_styles>

Evaluation

python3 main.py \
--dataset <dataset_name> \
--data_root <dataset_root> \
--ckpt <path_to_tested_model> \
--test_only \
--ACDC_sub <ACDC_subset_if_tested_on_ACDC>   

Inference & Visualization

To test any model on any image and visualize the output, please add the images to predict_test directory and run:

python3 predict.py \
--ckpt <ckpt_path> \
--save_val_results_to <directory_for_saved_output_images>

License

FAMix is released under the Apache 2.0 license.

Acknowledgement

The code is based on this implementation of DeepLabv3+, and uses code from CLIP, PODA and RobustNet.


↑ back to top

Releases

No releases published

Packages

No packages published

Languages