Skip to content

opendatalab/LEGION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LEGION: Learning to Ground and Explain for Synthetic Image Detection

Paper PDF     Website    

📢 Latest News

  • [2025.3.21]: 🔥 Now, SynthScars dataset is available on HuggingFace. Click here to download it.
  • [2025.3.20]: 🔥 We have released our paper on arXiv. Check out the paper here.

 Overview

The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.

framework

🏆 Contribution

  • We introduce SynthScars, a challenging dataset for synthetic image detection, featuring high-quality synthetic images with diverse content types, as well as fine-grained pixel-level artifact annotations with detailed textual explanations.
  • We propose LEGION, a comprehensive image forgery analysis framework for artifact localization, explanation generation, and forgery detection, which effectively aids human experts to detect and understand image forgeries.
  • Extensive experiments demonstrate that LEGION achieves exceptional performance on 4 challenging benchmarks. Comparisons with 19 existing methods show that it achieves SOTA performance on majority of metrics, exhibiting strong robustness and generalization ability.
  • We position LEGION not only as a defender against ever-evolving generative technologies but also as a controller that guides higher-quality and more realistic image generation.

🛠️ Installation

Please clone our repository and change directory to the folder:

git clone [email protected]:opendatalab/LEGION.git
cd LEGION

Our model is based on the GLaMM environment. Please follow the steps below to configure the environment.

conda create -n legion python=3.10
conda activate legion
pip install -r requirements/default.txt

# Install mmcv
git clone https://github.com/open-mmlab/mmcv
cd mmcv
git checkout v1.4.7
MMCV_WITH_OPS=1 pip install -e .

📦 Dataset

framework

SynthScars is a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Please download it from HuggingFace, unzip it into the ./data folder, and ensure it follows the structure below:

./data
└── SynthScars
    ├── train
    │   ├── images
    │   │   ├── xxx.jpg
    │   │   └── ...
    │   └── annoations
    │       └── train.json
    └── test
        ├── images
        │   ├── xxx.jpg
        │   └── ...
        └── annoations
            └── test.json

📌 Usage

As a defender, we provide the training and evaluation code for the image forgery analysis task as follows:

  • Localization and   Explanation

    If you want to train LEGION to perform artifact localiztion and explanation generation, please first download the pretrained weights of GLaMM & SAM, and then run:
      bash scripts/loc_exp/train.sh
    After training, you need to merge the weights as follows:
      bash scripts/merge_weights/step1.sh
      bash scripts/merge_weights/step2.sh
    If you want to infer on your data, please put them into one folder and then run:
      bash scripts/loc_exp/infer.sh
  • Detection

    If you want to train LEGION to conduct deepfake detection, please replace the data path and model path in the bash file and run:
      bash scripts/cls/train.sh
    If you want to eval on your data, please run:
      bash scripts/cls/eval.sh

As a controller, we provide the pipelines of both image regeneration and inpainting for high-quality and more realistic image generation:

  • Regeneration

    You can run iteratively prompt revision and image regeneration via:
      conda create -n regenerator python=3.10 -y
      conda activate regenerator
      pip install -r requirements/regenerator.txt
      bash scripts/refine/regenerator.sh
  • Inpainting

    You can run iteratively regional inpainting via:
      bash scripts/refine/inpainter.sh

📊 Performance

As a defender, we evaluate LEGION's performance on three key subtasks: artifact localization, explanation generation, and deepfake detection.

  • Localization

    We evaluate the artifact localization task using the SynthScars, RichHF-18K, and LOKI datasets, which contain region-level mask annotations.
result
  • Explanation

    To assess the model's natural language interpretability, we use the SynthScars and LOKI datasets, which provide detailed artifact explanations.
result
  • Detection

    For the deepfake detection task, we adopt the classic UniversialFakeDetect benchmark.
result

As a controller, we explore LEGION's capability to guide image generation through two pipelines: image regeneration and inpainting, based on its analytical feedback.

result

Note: More visualization results and case studies please refer to our Project Page.

😄 Acknowledgement

This repository is built upon the work of GLaMM. We appreciate their contributions and insights that have provided a strong foundation for our research.

📜 Citation

If you find our work interesting and helpful, please consider giving our repo a star. Additionally, if you would like to cite our work, please use the following format:

@article{kang2025legion,
  title={Legion: Learning to ground and explain for synthetic image detection},
  author={Kang, Hengrui and Wen, Siwei and Wen, Zichen and Ye, Junyan and Li, Weijia and Feng, Peilin and Zhou, Baichuan and Wang, Bin and Lin, Dahua and Zhang, Linfeng and others},
  journal={arXiv preprint arXiv:2503.15264},
  year={2025}
}

About

The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •