Search for a command to run...
This Zenodo archive contains the code accompanying the manuscript “Sentiment-aware image captioning with visually consistent sentiment calibration.” The repository provides a consolidated and reproducible implementation of the proposed multimodal approach for generating sentiment-aware image captions. The framework integrates Vision Transformer (ViT) for visual feature extraction, a GPT-2–based decoder for caption generation, and multiple sentiment classification models including LSTM, GRU, CNN, Transformer, and fine-tuned BERT. In addition, the archive implements the proposed Visually Consistent Sentiment Calibration (VCSC) mechanism, which aligns textual sentiment with visual context through a lightweight post-hoc calibration strategy without requiring joint retraining. The package includes:- A unified Python implementation of the full pipeline (caption generation, sentiment classification, and calibration)- Baseline models for comparative evaluation (TF-IDF + classical ML, deep learning, and transformer-based methods)- Experimental configurations aligned with the manuscript- Documentation and usage instructions- Original notebooks for transparency and reproducibility The implementation supports five-class sentiment labeling with emoji-based representation and reproduces the experimental setup reported in the manuscript, including evaluation metrics (BLEU, ROUGE, METEOR, CIDEr for captioning; accuracy, precision, recall, and F1-score for sentiment classification) and ablation analysis of the VCSC component. This archive is intended to ensure full reproducibility of the results and to facilitate further research in multimodal sentiment-aware image captioning.