CROWN: Curated Repository Of Well-resolved Noncovalent interactions

20260 citationsJournal Articlegreen Open Access

Authors

Robin Poelmans · KU Leuven

Wout Van Eynde · KU Leuven

Bence Bruncsics · Institute for Computer Science and Control

Balint Bruncsics · Institute for Computer Science and Control

Ádám Arany · KU Leuven

Yves Moreau · KU Leuven

Arnout RD Voet ·

Abstract

The development of machine learning models for protein–ligand interactions is fundamentally constrained by the quality and diversity of available structural data. Existing databases of protein–ligand complexes present researchers with an unsatisfying trade–off: carefully curated collections such as PDBBind and HiQBind offer high structural reliability but cover only a narrow slice of the Protein Data Bank (PDB), while large–scale resources like PLInder provide broad coverage at the expense of rigorous quality control. Here, we introduce CROWN (Curated Repository Of Well–resolved Non-covalent interactions), a machine learning–ready dataset that reconciles this tension by applying a comprehensive, fully automated preprocessing pipeline to the PLInder database. Starting from 649,915 protein–ligand interaction systems, CROWN applies a series of interleaved quality filters and processing stages addressing crystallographic resolution, ligand identity, pocket completeness, structural repair, interaction quality, and protonation at physiological pH. A distinguishing feature of the pipeline is a final constrained energy minimization step using custom flat–bottomed restraints, which balances crystallographic evidence with relaxation of intramolecular strain. This step — absent from existing protein–ligand datasets — produces structurally uniform complexes by reconciling the heterogeneous refinement practices of different crystallographers and structure determination protocols, without distorting the experimentally observed binding geometry. The resulting dataset of 153,005 complexes represents a roughly four–fold increase in protein and species diversity over PDBBind and HiQBind, while maintaining rigorous structural standards. Importantly, CROWN adopts a geometry–centric design philosophy that treats the 3D arrangement of atoms at the binding interface as a self–consistent source of information, rather than relying on externally measured binding affinities that cover only a fraction of known structures and introduce well–documented biases. We anticipate that CROWN will serve as a broadly useful resource for training generative models of protein–ligand binding poses, developing scoring functions, and benchmarking interaction prediction methods.

Topics & Keywords

Protein Structure and Dynamics Machine Learning in Materials Science Enzyme Structure and Function

Publication Details

Published in: bioRxiv (Cold Spring Harbor Laboratory)

DOI: 10.64898/2026.03.30.714168

Field-Weighted Citation Impact: 0.00

Command Palette

CROWN: Curated Repository Of Well-resolved Noncovalent interactions

Authors

Abstract

Topics & Keywords

Publication Details