Search for a command to run...
Introduction Automated documentation tools are being rapidly adopted in healthcare and clinical workflows. Among these are AI-enabled ambient scribing products, which transcribe conversations between patients and healthcare providers, then produce clinical records using automatic speech recognition (ASR) and generative AI such as Large Language Models (LLMs). While research suggests these technologies can reduce clinical burden, safe and responsible deployment requires that these tools determine what captured information is appropriate to record and under which circumstances. This presents a contextual privacy challenge distinct from PII leakage or data memorization and remains largely untested. Methods We address this gap by operationalizing privacy leakage as the inappropriate inclusion of third-party personal information in LLM-generated clinical notes. We construct a benchmark of transcripts containing private information with gold standard clinical notes by enriching patient metadata from the aci-bench corpus and injecting third-party personal information across six relationship types and seven information topics. We evaluate open weight LLaMA 3.1 8 and 70 B, Mixtral 8×7B and 8×22B, and proprietary Claude 3.5 Haiku and Sonnet models on note generation using prompts with varied privacy and structural requirements. Results All examined models leaked third-party information, and privacy instructions helped reduce leakage but proved neither complete nor robust as a solution. Models could generate privacy-infringing notes despite correctly identifying such information as inappropriate to share. Decomposing generation and privacy editing into separate steps could further reduce leakage, but only when privacy was defined with contextual specificity. Discussion No single mitigation eliminated leakage entirely, but combining approaches yielded the greatest reductions. Results emphasize the need to build privacy-by-design systems and develop evaluation strategies that reflect emerging information synthesis and sharing practices.