DeepIM: Integrating Channel-Spatial Attention with Transformer for DNA i-Motif Folding Status Prediction

20260 citationsJournal Article

Authors

Rui Wu · University of Electronic Science and Technology of China

Hui Zhang · Inner Mongolia University

Li-Rong Zhang · Inner Mongolia University

Zheng Zhang · Murray State University

Quan Zou · University of Electronic Science and Technology of China

Li Liu · University of Electronic Science and Technology of China

Abstract

i-Motif (iM), a quadruplex structure formed by C-rich DNA sequences under acidic conditions, is significant for gene expression regulation, telomere stability, and cancer development. Traditional experimental methods for detecting iMs, such as circular dichroism (CD) spectroscopy and nuclear magnetic resonance (NMR), are limited by high costs and low throughput. Existing computational models relying on manual feature extraction struggle to capture complex sequence-structure relationships underlying iM formation. We introduce DeepIM, a novel deep learning model that integrates a channel-spatial attention (CSA) mechanism with a Transformer architecture to predict iM folding status with high accuracy and interpretability. DeepIM encodes DNA sequences into k-mers, using embedding and positional encoding layers to retain semantic and spatial sequence information. The CSA mechanism, where channel attention focuses on C-tracts and spatial attention targets on flanking regions─extracts local features, while the Transformer models long-range dependencies. Trained and tested on a data set of over 750,000 sequences, DeepIM achieves 92.6% accuracy, outperforming traditional methods such as XGBoost (86.0%) and random forest (87.0%), as well as the state-of-the-art computational tool, iM-Seeker (90.3%). DeepIM also demonstrates strong cross-cell-line generalization and the ability to identify distinctive iM sequence patterns, as proven by attention weight analysis and ablation experiments. Overall, DeepIM advances DNA secondary structure prediction by leveraging deep learning to understand complex sequence-structure relationships.

Topics & Keywords

DNA and Nucleic Acid Chemistry Genomics and Chromatin Dynamics Machine Learning in Bioinformatics

UN Sustainable Development Goals

Life in Land

Publication Details

Published in: Journal of Chemical Information and Modeling

DOI: 10.1021/acs.jcim.6c00023

Field-Weighted Citation Impact: 0.00