Improving Context-Attribution with Semi-Supervised Cross-Encoders

20250 citationsBook Chapterhybrid Open Access

Authors

Luca De Grandis · University of Modena and Reggio Emilia

Francesco Granata · University of Modena and Reggio Emilia

Ermelinda Oro · Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria

Abstract

Ensuring that generated text is accurately attributed to its underlying sources is critical for the transparency, trustworthiness, and verifiability of large language model (LLM) outputs. In this work, we conduct a comparative study of post-hoc context-attribution methods, focusing on the use of cross-encoders—both frozen and fine-tuned—as well as proprietary and open-source LLMs in low-annotation settings. We explore strategies for leveraging frozen LLMs for context-attribution without fine-tuning, and we develop techniques to optimize cross-encoder performance for semantic alignment between generated text and source material. Our evaluation spans four datasets: ASQA, ELI5, TREC-RAG, and a proprietary legal corpus, and includes both answer-level and sentence-level attribution tasks. Additionally, we investigate the impact of training small cross-encoders on synthetic data to assess their scalability and deployment potential in resource-constrained environments. Our results demonstrate that cross-encoders prove to be valid alternatives to LLMs for post-generation answer-level context-attribution. Moreover, after proper hyperparameter tuning, the same model can achieve performance comparable to proprietary LLM performance for sentence- and answer-level context-attribution. Finally, trained solely on synthetic data, small cross-encoders’ performance can be further improved while offering a scalable and cost-effective solution.

Topics & Keywords

Context-Aware Activity Recognition Systems

Publication Details

Published in: Frontiers in artificial intelligence and applications

DOI: 10.3233/faia251415

Field-Weighted Citation Impact: 0.00

Command Palette

Improving Context-Attribution with Semi-Supervised Cross-Encoders

Authors

Abstract

Topics & Keywords

Publication Details