Search for a command to run...
Background Primary care physicians (PCPs) are a critical first contact for eye-health screening, risk stratification, and referral, yet ophthalmology training and point-of-care support in primary care remain insufficient. Recent advances in generative artificial intelligence (generative AI), particularly large language models (LLMs), may help address these gaps through conversational, scenario-based learning and structured feedback. However, the educational effectiveness, reproducibility, and safety boundaries of LLM-enabled tools in primary care ophthalmology remain unclear. Methods We conducted a systematic review of studies evaluating or applying LLMs in ophthalmic education, training, assessment, or primary care–relevant clinical support. PubMed, Web of Science Core Collection, and Scopus were searched from January 1, 2020 to December 31, 2025 using combined terms related to LLMs/generative AI, ophthalmology, and education or assessment. Citation chaining was also performed to reduce omission. Two reviewers independently screened records and extracted data. Results The evidence base is dominated by vignette-based benchmarks, comparative scoring studies, and evaluations conducted in limited-sample or controlled settings; prospective real-world validation using learner transfer, clinical behavior change, workflow impact, or patient outcomes remains scarce. Across studies, LLMs can serve as “cognitive apprenticeship” partners by externalizing clinical reasoning and enabling repeated practice in key-feature extraction, differential diagnosis, risk stratification, and referral-threshold decisions. Applications include triage/reasoning drills, virtual patient interviewing, and support for structured referrals, documentation, and patient education, often strengthened by retrieval-augmented generation. Most studies benchmarked outputs against expert consensus or guidelines, but scoring rubrics and reference standards varied widely, limiting cross-study comparability. Some reports noted that adding clinical photographs could reduce accuracy, suggesting current multimodal models are better suited for history-based reasoning than fine-grained image interpretation. Limitations include heterogeneity, rapid model iteration, reproducibility challenges, multimodal instability, and safety risks such as hallucination, bias, and automation bias. Commonly recommended safeguards include retrieval grounding, source attribution, red-flag checklists, and human-in-the-loop review. Conclusion With clearly defined task scopes and robust safeguards, LLMs may improve the accessibility and efficiency of primary care ophthalmic education, but should augment rather than replace expert judgment. Future work should prioritize pragmatic multicenter trials, mixed-method implementation studies, and standardized cross-lingual evaluations to define safe and effective implementation pathways.