Exploring transformer models: Fine-tuning VS inference on relation extraction from biomedical texts

20251 citationsJournal Articlegold Open Access

Authors

Hajar El janah · Sidi Mohamed Ben Abdellah University

Youness Nachid-Idrissi · Sidi Mohamed Ben Abdellah University

Mourad Sarrouti · Theravance Biopharma (United States)

Saïd Najah · Sidi Mohamed Ben Abdellah University

Abstract

Biomedical data continues to grow significantly, coming from different sources and being updated daily. This makes manual extraction not only time-consuming but also impossible to keep up with due to this constant increase. In this context, biomedical relation extraction, which aims to automate the discovery of relationships between entities from free texts, becomes an essential step for knowledge discovery. While fine-tuning Transformer models such as T5, PubMedBERT, BioBERT, ClinicalT5, and RoBERTa has shown satisfactory results, it requires specific datasets, which are time-consuming to create and costly since they require domain experts. One ideal solution is the use of Generative Artificial Intelligence (GenAI), as it is directly applicable to a problem without the need for data creation. In this paper, we explore these generative large language models (LLMs) to evaluate whether they can be reliable when it comes to processing biomedical data. To do so, we study the relation extraction task of four major biomedical tasks, namely chemical-protein relation extraction, disease-protein relation extraction, drug-drug interaction, and protein-protein interaction. To address this need, our study focuses on comparing the performance of fine-tuned Transformer models with generative models such as Mistral-7B, LLaMA2-7B, GLiNER, LLaMA3-8B, Gemma, RAG, and Me-LLaMA-13B, using the same datasets in both experiments, showing that fine-tuned Transformer models achieve performance levels roughly twice those obtained by generative LLMs. These models require more pretraining on specific data, as demonstrated by Me-LLaMA (pretrained on MIMIC-III), which shows a significant improvement in performance compared to the model pretrained on a general domain. In terms of performance, fine-tuned Transformer models on domain-specific biomedical data achieved average scores ranging from 84.42 to 90.35, while generative models obtained significantly lower scores, between 36.64 and 53.94. Among the generative LLMs, LLaMA3-8B, RAG, and Me-LLaMA-13B achieved the top three scores, with Me-LLaMA, pretrained on MIMIC-III, reaching 45.76, illustrating the benefit of domain-specific pretraining.

Topics & Keywords

Biomedical Text Mining and Ontologies Topic Modeling Bioinformatics and Genomic Networks

Publication Details

Published in: Computational and Structural Biotechnology Journal

Volume 31, pp. 157-168

DOI: 10.1016/j.csbj.2025.12.004

Field-Weighted Citation Impact: 0.83

Command Palette

Exploring transformer models: Fine-tuning VS inference on relation extraction from biomedical texts

Authors

Abstract

Topics & Keywords

Publication Details