Search for a command to run...
Ensuring that generated text is accurately attributed to its underlying sources is critical for the transparency, trustworthiness, and verifiability of large language model (LLM) outputs. In this work, we conduct a comparative study of post-hoc context-attribution methods, focusing on the use of cross-encoders—both frozen and fine-tuned—as well as proprietary and open-source LLMs in low-annotation settings. We explore strategies for leveraging frozen LLMs for context-attribution without fine-tuning, and we develop techniques to optimize cross-encoder performance for semantic alignment between generated text and source material. Our evaluation spans four datasets: ASQA, ELI5, TREC-RAG, and a proprietary legal corpus, and includes both answer-level and sentence-level attribution tasks. Additionally, we investigate the impact of training small cross-encoders on synthetic data to assess their scalability and deployment potential in resource-constrained environments. Our results demonstrate that cross-encoders prove to be valid alternatives to LLMs for post-generation answer-level context-attribution. Moreover, after proper hyperparameter tuning, the same model can achieve performance comparable to proprietary LLM performance for sentence- and answer-level context-attribution. Finally, trained solely on synthetic data, small cross-encoders’ performance can be further improved while offering a scalable and cost-effective solution.