Processed Data and Reproducible Workflow for a Transcriptomics-Driven Chemoinformatics Pipeline in Bladder Cancer

20260 citationsDatasetgreen Open Access

Authors

Koguchi Tomoyuki · Ijinkai Takeda General Hospital

Abstract

This Zenodo record contains processed data files, supplementary result tables, analysis scripts, docking validation files, machine-learning validation files, and environment information associated with the manuscript: “A Transcriptomics-Driven Chemoinformatics Workflow for In Silico Drug Discovery: A Case Study of CCR5 and NNMT in Bladder Cancer”. The archive is intended to support transparency and computational reproducibility of the workflow described in the manuscript. The workflow integrates:(1) resistance signature construction from public datasets,(2) projection of resistance signatures onto TCGA-BLCA transcriptomic profiles,(3) resistance-axis scoring and quadrant assignment,(4) Hallmark ssGSEA analysis,(5) effect size-based pathway and gene prioritization,(6) target-specific compound prioritization for CCR5 and NNMT,(7) molecular docking,(8) docking protocol validation by redocking and retrospective benchmark analyses,(9) target-specific machine-learning validation including Y-scrambling for the CCR5 random forest model,(10) Pareto-based multi-objective prioritization, and(11) shared chemical-space visualization. The record includes:- scripts for resistance score calculation- scripts for effect-size prioritization- scripts for docking-related prioritization workflows- scripts for docking protocol validation- scripts for machine-learning sanity checks- scripts for chemical-space visualization- processed transcriptomic and prioritization datasets- supplementary screening and validation tables- redocking, retrospective benchmark, and Y-scrambling summary files- Conda environment information- archive-level README files CCR5 prioritization in this archive is based on an internally validated target-specific machine-learning workflow combined with docking analyses supported by redocking and retrospective benchmark evaluation. For CCR5, the final adopted docking protocol used a revised search space after redocking-based box recentering. An additional Y-scrambling analysis further supported that the observed CCR5 random forest model performance was unlikely to arise from chance correlation. NNMT prioritization is retained as an exploratory workflow view based on archived screening scores from the original NNMT case-study pipeline. In the revised manuscript, NNMT docking is interpreted as exploratory structural support rather than as strongly validated ranking evidence. During peer review, the record metadata are publicly available, whereas the files are under restricted access. Access to the restricted files can be provided to editors and reviewers upon request or via a private review link.

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18917295

Command Palette

Processed Data and Reproducible Workflow for a Transcriptomics-Driven Chemoinformatics Pipeline in Bladder Cancer

Authors

Abstract

Topics & Keywords

Publication Details