Template-Based Evaluation of Stable Diffusion via Attention Maps

20260 citationsJournal Articlegold Open Access

Authors

Haruno Fusa · Okayama University of Science

Chonho Lee · Okayama University of Science

Sakuei Onishi · Okayama University of Science

Kanshin Fusa · National Institute of Technology, Tsuyama College

Hiromitsu Shiina · Okayama University of Science

Abstract

Text-to-image models such as Stable Diffusion (SD) require comprehensive, fine-grained, and high-precision methods for evaluating text–image alignment. A prior method, the text–image alignment metric (TIAM), employs a template-based approach for fine-grained, high-precision evaluation; however, it is restricted to objects and colors, limiting its comprehensiveness. This study extends the TIAM by incorporating attention maps and vision–language models to deliver a fine-grained and high-precision evaluation framework that goes beyond colors and objects to include attributes, actions, and positions. In our experiments, we analyze the evaluation scores of images generated by the proposed method and compare them with human judgments. The results demonstrate that the proposed method outperforms existing methods, exhibiting a stronger correlation with human judgments (r = 0.853, p<10−48). In addition, we applied the proposed method to evaluate the generation abilities of three SD models (i.e., SD1.4, SD2, and SD3.5). Each experiment used over 900 images, totaling 9858 images across all experiments to ensure statistical significance. The results indicate that SD3.5 exhibits superior expressiveness compared with SD1.4 and SD2. Nevertheless, for more complex tasks such as multi-attribute generation or multi-action generation, limitations in text–image alignment remain evident.

Topics & Keywords

Multimodal Machine Learning Applications Generative Adversarial Networks and Image Synthesis Text Readability and Simplification

Publication Details

Published in: Information

Volume 17, Issue 2, pp. 149-149

DOI: 10.3390/info17020149

Field-Weighted Citation Impact: 0.00

Command Palette

Template-Based Evaluation of Stable Diffusion via Attention Maps

Authors

Abstract

Topics & Keywords

Publication Details