VietTravelVQA

20260 citationsDatasetgreen Open Access

Authors

NGUYEN VAN NHA · Vietnam National University, Hanoi

HUAN PHUNG THE · Thai Nguyen University

Phu Nguyen Dinh · Hung Yen University of Technology and Education

Quan Le Hong · Vietnam National University, Hanoi

Huy Truong Quoc · Hung Yen University of Technology and Education

Duy Pham Khanh · Vietnam National University, Hanoi

TUAN LE MINH

Abstract

Objective: This dataset is designed to benchmark and improve Visual Question Answering (VQA) systems in the context of Vietnamese tourism and cultural heritage. It addresses the lack of high-quality, regionally specific multimodal data for Southeast Asia. Data Content: The dataset comprises thousands of images sourced from Wikimedia Commons, paired with human-verified question-answer sets in Vietnamese. The questions cover five levels of complexity, ranging from basic object identification to deep cultural reasoning. Methodology: 1. Sourcing: Legally compliant images were filtered from Wikimedia Commons. 2. Annotation: Expert annotators generated QA pairs, focusing on architectural details, historical significance, and spatial reasoning. 3. Validation: Data was cleaned using automated scripts to ensure 100% synchronization between metadata (JSON) and image files, with factual auditing via Large Multimodal Models (LMMs). Usage: The data is split into train and test sets (.json). It is intended for training, fine-tuning, and evaluating Vision-Language Models (VLMs) on localized cultural contexts.

Topics & Keywords

Publication Details

Published in: Mendeley Data

DOI: 10.17632/fvxtpnh8mh

Command Palette

VietTravelVQA

Authors

Abstract

Topics & Keywords

Publication Details