Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

Chuangchuang Tan; Xiang Ming; Jinglu Wang; Renshuai Tao; Bin Li; Yunchao Wei; Yao Zhao; Yan Lu

Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

Chuangchuang Tan ,
Xiang Ming ,
Jinglu Wang ,
Renshuai Tao ,
Bin Li ,
Yunchao Wei ,
Yao Zhao ,
Yan Lu

ICLR 2026 | October 2025

Download BibTex

The rapid advancement of AI-generated content (AIGC) has enabled the synthesis of visually convincing images; however, many such outputs exhibit subtle \(\textbf{semantic anomalies}\), including unrealistic object configurations, violations of physical laws, or commonsense inconsistencies, which compromise the overall plausibility of the generated scenes. Detecting these semantic-level anomalies is essential for assessing the trustworthiness of AIGC media, especially in AIGC image analysis, explainable deepfake detection and semantic authenticity assessment. In this paper, we formalize \(\textbf{semantic anomaly detection and reasoning}\) for AIGC images and introduce \(\textbf{AnomReason}\), a large-scale benchmark with structured annotations as quadruples \(\textit{(Name, Phenomenon, Reasoning, Severity)}\). Annotations are produced by a modular multi-agent pipeline (\(\textbf{AnomAgent}\)) with lightweight human-in-the-loop verification, enabling scale while preserving quality. At construction time, AnomAgent processed approximately 4.17\,B GPT-4o tokens, providing scale evidence for the resulting structured annotations. We further show that models fine-tuned on AnomReason achieve consistent gains over strong vision-language baselines under our proposed semantic matching metric (\(\textit{SemAP}\) and \(\textit{SemF1}\)). Applications to {explainable deepfake detection} and {semantic reasonableness assessment of image generators} demonstrate practical utility. In summary, AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images. We will release code, metrics, data, and task-aligned models to support reproducible research on semantic authenticity and interpretable AIGC forensics.