Search for a command to run...
i-Motif (iM), a quadruplex structure formed by C-rich DNA sequences under acidic conditions, is significant for gene expression regulation, telomere stability, and cancer development. Traditional experimental methods for detecting iMs, such as circular dichroism (CD) spectroscopy and nuclear magnetic resonance (NMR), are limited by high costs and low throughput. Existing computational models relying on manual feature extraction struggle to capture complex sequence-structure relationships underlying iM formation. We introduce DeepIM, a novel deep learning model that integrates a channel-spatial attention (CSA) mechanism with a Transformer architecture to predict iM folding status with high accuracy and interpretability. DeepIM encodes DNA sequences into k-mers, using embedding and positional encoding layers to retain semantic and spatial sequence information. The CSA mechanism, where channel attention focuses on C-tracts and spatial attention targets on flanking regions─extracts local features, while the Transformer models long-range dependencies. Trained and tested on a data set of over 750,000 sequences, DeepIM achieves 92.6% accuracy, outperforming traditional methods such as XGBoost (86.0%) and random forest (87.0%), as well as the state-of-the-art computational tool, iM-Seeker (90.3%). DeepIM also demonstrates strong cross-cell-line generalization and the ability to identify distinctive iM sequence patterns, as proven by attention weight analysis and ablation experiments. Overall, DeepIM advances DNA secondary structure prediction by leveraging deep learning to understand complex sequence-structure relationships.