Advancing FAIR Data Management through AI-Assisted Curation of Morphological Data Matrices

20250 citationsPreprintgreen Open Access

Authors

Shreya Jariwala · Phoenix Bioinformatics

Brooke Long-Fox · Phoenix Bioinformatics

Tanya Berardini · Phoenix Bioinformatics

Abstract

Abstract Curation of biological and paleontological datasets is a labor-intensive process that requires standardization and validation to ensure data integrity. In particular, manual curation of datasets is prone to human errors such as typographical errors, inconsistent formatting, and incomplete metadata, which hinder reproducibility and compliance with Findability, Accessibility, Interoperability, and Reusability (FAIR) principles. Artificial Intelligence (AI) offers a transformative solution for enhancing research efficiency by automating data validation, improving accuracy, and streamlining curation workflows. This study presents a proof-of-concept implementation of an AI-assisted curation tool developed for MorphoBank, an open access repository established to enhance standardization and usability of morphological character datasets. Specifically, this work presents an AI tool designed to extract, structure, and standardize morphological character data from published literature into the NEXUS file format, a widely used format for phylogenetic analyses. This tool leverages machine learning techniques, including Large Language Models (LLMs), to automate the extraction of character names and states from text in various formats, reducing manual data entry errors and improving data completeness. The system enables efficient conversion of matrix-only files into complete, machine- and human-readable datasets that include key character metadata. By assisting with these tasks, the tool reduces the manual effort required for curation while improving consistency and standardization. This approach increases the FAIRness of morphological character data and provides a framework for extending AI-assisted curation to other types of biological data. These results illustrate the potential of AI-assisted workflows to support scalable data curation and reuse in paleontology, systematics, and evolutionary biology.

Topics & Keywords

Geological Modeling and Analysis AI in cancer detection Research Data Management Practices

Publication Details

Published in: bioRxiv (Cold Spring Harbor Laboratory)

DOI: 10.1101/2025.07.08.663621

Command Palette

Advancing FAIR Data Management through AI-Assisted Curation of Morphological Data Matrices

Authors

Abstract

Topics & Keywords

Publication Details