Search for a command to run...
Biomedical data continues to grow significantly, coming from different sources and being updated daily. This makes manual extraction not only time-consuming but also impossible to keep up with due to this constant increase. In this context, biomedical relation extraction, which aims to automate the discovery of relationships between entities from free texts, becomes an essential step for knowledge discovery. While fine-tuning Transformer models such as T5, PubMedBERT, BioBERT, ClinicalT5, and RoBERTa has shown satisfactory results, it requires specific datasets, which are time-consuming to create and costly since they require domain experts. One ideal solution is the use of Generative Artificial Intelligence (GenAI), as it is directly applicable to a problem without the need for data creation. In this paper, we explore these generative large language models (LLMs) to evaluate whether they can be reliable when it comes to processing biomedical data. To do so, we study the relation extraction task of four major biomedical tasks, namely chemical-protein relation extraction, disease-protein relation extraction, drug-drug interaction, and protein-protein interaction. To address this need, our study focuses on comparing the performance of fine-tuned Transformer models with generative models such as Mistral-7B, LLaMA2-7B, GLiNER, LLaMA3-8B, Gemma, RAG, and Me-LLaMA-13B, using the same datasets in both experiments, showing that fine-tuned Transformer models achieve performance levels roughly twice those obtained by generative LLMs. These models require more pretraining on specific data, as demonstrated by Me-LLaMA (pretrained on MIMIC-III), which shows a significant improvement in performance compared to the model pretrained on a general domain. In terms of performance, fine-tuned Transformer models on domain-specific biomedical data achieved average scores ranging from <b>84.42</b> to <b>90.35</b>, while generative models obtained significantly lower scores, between <b>36.64</b> and <b>53.94</b>. Among the generative LLMs, LLaMA3-8B, RAG, and Me-LLaMA-13B achieved the top three scores, with Me-LLaMA, pretrained on MIMIC-III, reaching <b>45.76</b>, illustrating the benefit of domain-specific pretraining.
Published in: Computational and Structural Biotechnology Journal
Volume 31, pp. 157-168