Search for a command to run...
Uncharacterized functions of enzymes represent an untapped opportunity to develop therapeutics, unlock the sustainable synthesis of materials, and understand the evolution of life-sustaining metabolic networks. Uncharacterized enzymes and reactions, generated by protein language models and computer-aided synthesis tools, respectively, make up a large part of this opportunity. Given the technical complexity of high-throughput enzymatic activity screens, predictive models are needed that can prescreen enzyme-reaction pairs <i>in silico</i>. We present (1) a high-quality data set of enzyme-reaction pairs, (2) a rigorous battery of model evaluations varying in their approaches to data splitting and negative sampling, (3) a comprehensive benchmarking of enzyme-reaction models, and (4) a pair of parameter-efficient, data-efficient, high-performing models called Reaction-Center Graph Neural Networks (RC-GNNs) capable of predicting whether an enzyme, represented by an amino acid sequence, can significantly catalyze a given reaction, represented by its full set of reactants and products. In the most difficult conditions, where the query reactions were highly dissimilar from those present in the training data set, our models achieved 0.88 and 0.84 ROC-AUC on classification tasks featuring globally selected and synthetic negatives, respectively. On a time-based split, an RC-GNN achieved 0.91 ROC-AUC. The ability to successfully make predictions on enzymes and reactions distinct from those used during training makes the RC-GNNs especially useful for both metabolic engineers and evolutionary biologists who need to reason about uncharacterized enzymatic reactions.