Search for a command to run...
Molecular mimicry — structural or sequence similarity between pathogen-derived and host self-peptides sufficient to trigger cross-reactive immune responses — has been proposed as a mechanism of autoimmune triggering across rheumatoid arthritis, systemic lupus erythematosus, ankylosing spondylitis, systemic sclerosis, antiphospholipid syndrome, dermatomyositis, and Guillain-Barré syndrome. Computational identification of mimicry candidates has historically relied on sequence-based metrics, resting on the untested assumption that sequence similarity predicts structural similarity at the MHC-presented peptide level. We present MimicryDB-Auto, to our knowledge the first curated, labelled multi-pathogen dataset integrating MHC epitope prediction, sequence alignment, and atomic structural validation at the individual epitope level across both MHC class I and II presentations, comprising 399 pathogen-host peptide pairs spanning 32 organisms constructed through a reproducible seven-step pipeline. Following structural validation using TM-align with RMSD < 2.0 Å, 262 pairs were classified as confirmed unbound structural mimics and 137 as non-mimics. Within the confirmed mimic pool, sequence identity explained at most 1.6% of variance in structural RMSD at both the 2.0 Å threshold (r = −0.127, p = 0.036, n = 272) and the stricter 1.0 Å threshold (r = −0.046, p = 0.562, n = 159) — a relationship of no practical predictive utility across threshold definitions. A Random Forest classifier trained exclusively on sequence and immunological features achieved AUC-ROC = 0.958 (95% CI: 0.886–0.999), confirming a multivariate sequence signal exists but is insufficient as a standalone substitute for structural validation. Cross-pairing validation further confirmed that 99.2% of structurally equivalent non-matched pairs had zero detectable sequence similarity, quantifying the scope of sequence-dissimilar structural mimicry invisible to conventional screening. All structural comparisons were performed on unbound peptide conformations, representing a proxy for MHC-presented structure rather than direct immunological validation. MimicryDB-Auto and the complete pipeline are publicly available at https://github.com/minbaku/molecular-mimicry-RA-pipeline.