Decoupling Retrieval from Reasoning: A Scalable Conditional Memory Approach to Privacy-Preserving Clinical Coding

20260 citationsPreprintgreen Open Access

Authors

Abstract

Assigning diagnostic codes from free-text clinical documentation requires mapping unstructured clinical narratives to a structured ontology of over 70,000 ICD-10-CM codes. This task constitutes an extreme-scale information retrieval problem coupled with context-sensitive validation. Prior approaches frequently integrate retrieval and reasoning within generative language models, introducing limitations in scalability, determinism, latency, and deployability—particularly in regulated on-premise environments.We introduce a hybrid retrieval–verification architecture that explicitly decouples static ontology access from dynamic clinical interpretation. The system is built on a hierarchical conditional-memory index enabling deterministic bounded-time lookup across the full code space, followed by multi-stage candidate filtering and a lightweight domain verifier that performs compliance and consistency checks rather than open-ended generation. This design emphasizes structural validity, auditability, and systems scalability over generative flexibility.Evaluated as a feasibility study, the architecture achieves an F1@10 score of 4.7% on 500 real-world MIMIC-III clinical notes (ICD-9 setting), highlighting the challenges of operational documentation. On a curated synthetic ICD-10-CM dataset of 250 clinical scenarios, the system achieves 100% retrieval coverage and an F1@10 of 16.5%, representing a 96% relative improvement over lexical retrieval baselines. The fully on-premise implementation demonstrates a 24 ms retrieval latency and 703 ms end-to-end inference time without external data transmission.These findings position hierarchical conditional-memory as a viable systems primitive for extremescale, domain-specific information retrieval, supporting privacy-constrained clinical decision support workflows where deterministic retrieval and structural compliance are primary design requirements.

Topics & Keywords

Machine Learning in Healthcare Topic Modeling Biomedical Text Mining and Ontologies

UN Sustainable Development Goals

Peace, Justice and strong institutions

Publication Details

Published in: SPIRE - Sciences Po Institutional REpository

Command Palette

Decoupling Retrieval from Reasoning: A Scalable Conditional Memory Approach to Privacy-Preserving Clinical Coding

Authors

Abstract

Topics & Keywords

UN Sustainable Development Goals

Publication Details