LigandExplorer: An Automated Tool for Ligand Extraction from PDB Structures

20260 citationsJournal Article

Authors

Y. Li · Hunan Normal University

Rongfeng Zou · Recombinant Antibody Technology (United Kingdom)

Maohua Yang · Recombinant Antibody Technology (United Kingdom)

Ying Wang · Hunan Normal University

Zheng Liu · Hunan Normal University

Hang Zheng · Recombinant Antibody Technology (United Kingdom)

Abstract

The structural information on protein-ligand complexes is crucial for small-molecule design and drug discovery. Yet primary resources often have heterogeneous annotations, lack machine-ready ligand categorization, and require substantial postprocessing before large-scale modeling. Here, we present LigandExplorer, an open-source, automated postprocessing pipeline that identifies and extracts covalent and noncovalent ligands from biomolecular complex structures and standardizes outputs for downstream use. Using residue-level graphs built solely from atomic coordinates, LigandExplorer is robust to missing or inconsistent metadata and integrates LightGBM models to classify ligands (peptides, nucleic acids, phospholipids, carbohydrates, organics, and ions) and assess interaction relevance. Because the pipeline is rerunnable, it can be applied to each new databases release to keep derived, categorized data sets current without altering source records. On the PDBbind v2020 refined set, LigandExplorer achieved a 98.38% raw structural agreement under harmonized comparison criteria prior to any manual reconciliation; the remaining discrepancies were analyzed separately and were dominated by divergences between raw RCSB entries and curated PDBBind records. On the PepBDB, LigandExplorer successfully processed 4881 of 5005 complexes, achieving a 97.52% success rate. Most failures reflected upstream record errors, where complex cyclic peptides constituted the primary algorithmic boundary. LigandExplorer thus mitigates data-cleaning burdens and enables rapidly refreshed, standardized data sets for computational modeling and molecular design.

Topics & Keywords

Computational Drug Discovery Methods Protein Structure and Dynamics Machine Learning in Bioinformatics

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

Published in: Journal of Chemical Information and Modeling

Volume 66, Issue 6, pp. 3026-3035

DOI: 10.1021/acs.jcim.5c02921

Field-Weighted Citation Impact: 0.00