CTSNet: Cross-Modal Token Selection Network for Hyperspectral and Multispectral Image Fusion

20260 citationsJournal Articlegold Open Access

Authors

Yi'ao Jia · Xinjiang University

Yurong Qian · Remote Sensing Systems (United States)

Jun Long · Central South University

Yuanxu Wang · Xinjiang University

Sen Luo · Xinjiang University

Dongbin Hu · Xinjiang University

Weijun Gong · Hexi University

Abstract

The fusion of low-resolution hyperspectral images (LR-HSI) and high-spatial-resolution multispectral images (HR-MSI) aims to combine the advantages of both modalities to enhance both spatial and spectral resolution, thereby generating high-spatial-resolution hyperspectral images (HR-HSI). However, existing methods still struggle to balance global-local feature modeling and computational efficiency, and they face a core challenge: spectral distortion during the upsampling due to the lack of cross-modal guidance. To address these issues, this paper proposes a cross-modal token selection network, termed CTSNet. A novel cross-modal guided spatial implicit upsampling pyramid (SIUP) structure is introduced. Unlike conventional implicit neural representation (INR) methods, SIUP directly incorporates MSI features as conditional inputs during the local Multilayer Perceptron (MLP) prediction stage, providing precise spatial priors to the HSI upsampling process. This design enables early-stage and deep fusion of cross-modal information, effectively resolving the problems of spatial detail blurring and spectral distortion during upsampling. Secondly, a token selection Transformer block (TSTB) is proposed to collaboratively extract global-local spatial and spectral features through a parallel dual-branch structure. A token selection attention mechanism (TSAM) is further introduced to significantly reduce computational complexity by employing an adjustable token selection rate. Finally, we design a multi-scale hybrid fusion (MSHF) module to achieve deep feature reconstruction. Experiments on four public hyperspectral datasets demonstrate that CTSNet outperforms current state-of-the-art (SOTA) methods in both qualitative and quantitative evaluations.

Topics & Keywords

Advanced Image Fusion Techniques Remote-Sensing Image Classification Image and Signal Denoising Methods

Publication Details

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

DOI: 10.1109/jstars.2026.3678028

Field-Weighted Citation Impact: 0.00

Command Palette

CTSNet: Cross-Modal Token Selection Network for Hyperspectral and Multispectral Image Fusion

Authors

Abstract

Topics & Keywords

Publication Details