Hengrui Kang*, Siwei Wen*, Zichen Wen*, Junyan Ye, Weijia Li†
Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, Conghui He†
- [2025.3.21]: 🔥 Now, SynthScars dataset is available on HuggingFace. Click here to download it.
- [2025.3.20]: 🔥 We have released our paper on arXiv. Check out the paper here.
The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.
- We introduce SynthScars, a challenging dataset for synthetic image detection, featuring high-quality synthetic images with diverse content types, as well as fine-grained pixel-level artifact annotations with detailed textual explanations.
- We propose LEGION, a comprehensive image forgery analysis framework for artifact localization, explanation generation, and forgery detection, which effectively aids human experts to detect and understand image forgeries.
- Extensive experiments demonstrate that LEGION achieves exceptional performance on 4 challenging benchmarks. Comparisons with 19 existing methods show that it achieves SOTA performance on majority of metrics, exhibiting strong robustness and generalization ability.
- We position LEGION not only as a defender against ever-evolving generative technologies but also as a controller that guides higher-quality and more realistic image generation.
Please clone our repository and change directory to the folder:
git clone [email protected]:opendatalab/LEGION.git
cd LEGION
Our model is based on the GLaMM environment. Please follow the steps below to configure the environment.
conda create -n legion python=3.10
conda activate legion
pip install -r requirements/default.txt
# Install mmcv
git clone https://github.com/open-mmlab/mmcv
cd mmcv
git checkout v1.4.7
MMCV_WITH_OPS=1 pip install -e .
SynthScars is a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Please download it from HuggingFace, unzip it into the ./data
folder, and ensure it follows the structure below:
./data
└── SynthScars
├── train
│ ├── images
│ │ ├── xxx.jpg
│ │ └── ...
│ └── annoations
│ └── train.json
└── test
├── images
│ ├── xxx.jpg
│ └── ...
└── annoations
└── test.json
As a defender, we provide the training and evaluation code for the image forgery analysis task as follows:
-
If you want to train LEGION to perform artifact localiztion and explanation generation, please first download the pretrained weights of GLaMM & SAM, and then run:
After training, you need to merge the weights as follows:
bash scripts/loc_exp/train.sh
If you want to infer on your data, please put them into one folder and then run:bash scripts/merge_weights/step1.sh bash scripts/merge_weights/step2.sh
bash scripts/loc_exp/infer.sh
-
If you want to train LEGION to conduct deepfake detection, please replace the data path and model path in the bash file and run:
If you want to eval on your data, please run:
bash scripts/cls/train.sh
bash scripts/cls/eval.sh
As a controller, we provide the pipelines of both image regeneration and inpainting for high-quality and more realistic image generation:
-
You can run iteratively prompt revision and image regeneration via:
conda create -n regenerator python=3.10 -y conda activate regenerator pip install -r requirements/regenerator.txt bash scripts/refine/regenerator.sh
-
You can run iteratively regional inpainting via:
bash scripts/refine/inpainter.sh
As a defender, we evaluate LEGION's performance on three key subtasks: artifact localization, explanation generation, and deepfake detection.
- We evaluate the artifact localization task using the SynthScars, RichHF-18K, and LOKI datasets, which contain region-level mask annotations.
- To assess the model's natural language interpretability, we use the SynthScars and LOKI datasets, which provide detailed artifact explanations.
- For the deepfake detection task, we adopt the classic UniversialFakeDetect benchmark.
As a controller, we explore LEGION's capability to guide image generation through two pipelines: image regeneration and inpainting, based on its analytical feedback.
Note: More visualization results and case studies please refer to our Project Page.
This repository is built upon the work of GLaMM. We appreciate their contributions and insights that have provided a strong foundation for our research.
If you find our work interesting and helpful, please consider giving our repo a star. Additionally, if you would like to cite our work, please use the following format:
@article{kang2025legion,
title={Legion: Learning to ground and explain for synthetic image detection},
author={Kang, Hengrui and Wen, Siwei and Wen, Zichen and Ye, Junyan and Li, Weijia and Feng, Peilin and Zhou, Baichuan and Wang, Bin and Lin, Dahua and Zhang, Linfeng and others},
journal={arXiv preprint arXiv:2503.15264},
year={2025}
}