Search for a command to run...
While Electronic Health Records (EHRs) promise comprehensive documentation of patient care, in reality there are significant challenges in data reliability and utilization. EHRs contain vast amounts of unstructured clinical narratives that, despite containing critical and relevant medical information, remain difficult to systematically extract and verify. Recent advances in large language models (LLMs) offer increasingly improving capabilities for extracting structured information from clinical notes, yet these approaches raise fundamental questions about output reliability, over-confident token predictions, and provide no guarantees (statistical or otherwise) for downstream clinical applications. In this work, we present a conformal verification framework for unstructured EHR data extraction using generative AI. While LLMs have increasingly impressive capabilities, they are notoriously miscalibrated and overconfident in their predictions, necessitating rigorous verification methods to eliminate the need to trust AI models. Our approach (i) employs LLMs to extract medical entities and concepts from clinical narratives with LLM-as-a-judge verification, (ii) implements probabilistic calibration to quantify extraction confidence, and (iii) applies conformal prediction to provide finite-sample guarantees on error rates for accepted extractions. We evaluate our framework on 10k clinical visits across 898 clinical practices utilizing three different EHR systems. Our conformal verification approach can provide assurances that the future expected proportion of accepted but incorrect extractions remains below a pre-specified risk level with rigorous statistical verification. It also maintains formal guarantees over clinical data quality, and illuminates the miscalibrations present in state-of-the-art LLM models, requiring additional validation for safe deployment of automated extraction systems.
Published in: Proceedings of the AAAI Symposium Series
Volume 7, Issue 1, pp. 539-546