Regression-Based Modeling of Antisense Oligonucleotide Efficacy Using Sequence, Structural, and Off-Target Features

20260 citationsJournal Article

Authors

Yu Bai · California State University, Fullerton

Jialin Tang · California State University, Fullerton

Abstract

Antisense oligonucleotides (ASOs) are a promising class of nucleic acid-based therapeutics that regulate gene expression by binding target mRNAs, with applications in genetic and rare diseases. However, designing effective ASOs remains difficult due to the vast combinatorial space of sequences, secondary structures, and chemical modifications. Recent work has leveraged deep learning and graph neural networks to address these challenges. Building on this foundation, the present project explores a complementary pipeline using classical machine learning and statistical methods for ASO design and evaluation. The workflow integrates multiple computational stages: retrieval of target mRNA sequences from NCBI, interaction prediction using the miRanda algorithm, and structural analysis via ViennaRNA. Off-target interactions were systematically assessed, and custom Python scripts were developed to merge outputs into a unified dataset. Feature engineering incorporated both numeric and categorical predictors, such as cell line and density, enabling model testing of inhibitory efficiency. Features of sequence, structure and offtarget interactions trained multiple regressors including Linear Regression, Ridge, Lasso, Random Forest, and Gradient Boosting. Models were evaluated using nested cross-validation with groupaware splits to prevent leakage. Random Forest achieved the highest predictive performance to predict inhibition outcomes (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{R}^{2} \approx 0.627, \text{MAE} \approx 9.47$</tex>). These results highlight both the feasibility and the challenges of applying interpretable machine learning techniques to ASO design, particularly in the presence of substantial missing data. Future directions include exploring a normalized hybridization energy gradient with relative energy per nucleotide. This work demonstrates the potential for combining bioinformatics tools, structural modeling, and machine learning to advance the rational design of therapeutic ASOs.

Topics & Keywords

Machine Learning in Bioinformatics DNA and Nucleic Acid Chemistry Genomics and Chromatin Dynamics

UN Sustainable Development Goals

Zero hunger

Publication Details

DOI: 10.1109/ccwc67433.2026.11393870

Field-Weighted Citation Impact: 0.00