Query Matters: How Selection Strategies Influence Active Learning in Drug Discovery

20260 citationsJournal Articlehybrid Open Access

Authors

Huw Williams · University of Strathclyde

Stephen Pickett · GlaxoSmithKline (United Kingdom)

Andrew W. Baxter · GlaxoSmithKline (United Kingdom)

David Scott Palmer · University of Strathclyde

Abstract

We present SimDMTA, an <i>in silico</i> framework designed to simulate the Design-Make-Test-Analyze (DMTA) cycle used in preclinical drug discovery. Using docking scores as a proxy for biological assays, the simulations allow factors controlling the efficiency of the DMTA cycle to be explored in a manner that would not be feasible using traditional experiments due to time and cost constraints. In this workflow, a machine learning model predicts docking scores, selects compounds using various query strategies, docks selected molecules, and retrains iteratively. Starting from a broad chemical space, the model actively samples molecules derived from a 3,5-dimethyl-4-phenylisoxazole scaffold, an active warhead for the Bromodomain 4 (BRD4) BD1 binding site, to refine its predictions. Our results show that uncertainty-based sampling significantly outperforms greedy and hybrid approaches in both hit discovery and the ability of the model that predicts docking scores to generalize beyond its training set. Notably, by the final iteration, 37 of the top 50 ranked compounds were within the top 1% of the chemical space of all evaluated compounds. Strategies that include some random selection correct systematic biases more rapidly, but are less effective at predicting top-performing molecules. These findings underscore the value of incorporating molecular diversity and uncertainty into design strategies. While such strategies may deprioritize those molecules with the highest absolute predictions in early rounds, they markedly accelerate model refinement, ultimately leading to more effective hit identification in discovery driven by active learning.

Topics & Keywords

Machine Learning and Algorithms Computational Drug Discovery Methods Receptor Mechanisms and Signaling

Publication Details

Published in: Journal of Chemical Information and Modeling

Volume 66, Issue 6, pp. 3288-3301

DOI: 10.1021/acs.jcim.5c02504

Field-Weighted Citation Impact: 0.00

Command Palette

Query Matters: How Selection Strategies Influence Active Learning in Drug Discovery

Authors

Abstract

Topics & Keywords

Publication Details