Development of Reaction-Centered Encoders and Benchmarking of Enzyme-Reaction Pair Models

20260 citationsJournal Article

Authors

Stefan C. Pate · Northwestern University

Eric H. Wang · C4 Therapeutics (United States)

Linda J. Broadbelt · Northwestern University

Keith E. J. Tyo · Northwestern University

Abstract

Uncharacterized functions of enzymes represent an untapped opportunity to develop therapeutics, unlock the sustainable synthesis of materials, and understand the evolution of life-sustaining metabolic networks. Uncharacterized enzymes and reactions, generated by protein language models and computer-aided synthesis tools, respectively, make up a large part of this opportunity. Given the technical complexity of high-throughput enzymatic activity screens, predictive models are needed that can prescreen enzyme-reaction pairs <i>in silico</i>. We present (1) a high-quality data set of enzyme-reaction pairs, (2) a rigorous battery of model evaluations varying in their approaches to data splitting and negative sampling, (3) a comprehensive benchmarking of enzyme-reaction models, and (4) a pair of parameter-efficient, data-efficient, high-performing models called Reaction-Center Graph Neural Networks (RC-GNNs) capable of predicting whether an enzyme, represented by an amino acid sequence, can significantly catalyze a given reaction, represented by its full set of reactants and products. In the most difficult conditions, where the query reactions were highly dissimilar from those present in the training data set, our models achieved 0.88 and 0.84 ROC-AUC on classification tasks featuring globally selected and synthetic negatives, respectively. On a time-based split, an RC-GNN achieved 0.91 ROC-AUC. The ability to successfully make predictions on enzymes and reactions distinct from those used during training makes the RC-GNNs especially useful for both metabolic engineers and evolutionary biologists who need to reason about uncharacterized enzymatic reactions.

Topics & Keywords

Microbial Metabolic Engineering and Bioproduction Biochemical Acid Research Studies Gene Regulatory Network Analysis

Publication Details

Published in: Journal of Chemical Information and Modeling

DOI: 10.1021/acs.jcim.5c02755

Field-Weighted Citation Impact: 0.00

Command Palette

Development of Reaction-Centered Encoders and Benchmarking of Enzyme-Reaction Pair Models

Authors

Abstract

Topics & Keywords

Publication Details