Search for a command to run...
In-service examinations are widely used to monitor knowledge acquisition during postgraduate medical training. However, longitudinal interpretation of examination scores is limited by heterogeneity in test difficulty and the lack of psychometric calibration across assessments. Without appropriate adjustment, observed score changes may reflect examination variability rather than true learning progression. To evaluate longitudinal clinical knowledge performance among cardiology residents using a psychometrically informed analytical framework and to determine whether performance improvement persists after adjustment for examination heterogeneity. This retrospective longitudinal study included seven routine cardiology in-service examinations administered between March 2023 and November 2025 within a single residency program. Examination difficulty and reliability were evaluated using Classical Test Theory indices, including mean item difficulty (p-value), KR-20 reliability, and standard error of measurement. Resident performance was expressed as percentage of correct answers. Longitudinal changes in performance were analyzed using linear mixed-effects models with random intercepts for residents to account for repeated measurements and inter-individual variability. Time was defined as months since residency start. To minimize the impact of between-examination difficulty differences, a sensitivity analysis was performed using within-examination standardized performance scores (z-scores). A total of 138 resident–examination observations were analyzed. Examination difficulty varied substantially across tests (mean p-values 0.38–0.79), confirming marked heterogeneity. Reliability was moderate to high in most examinations (KR-20 range: 0.63–0.86). In the primary mixed-effects model, time in training was strongly associated with improved performance, with an average increase of 1.02 percentage points per month (95% CI 0.63–1.41, p < 0.001). Individual learning trajectories demonstrated considerable baseline heterogeneity but consistent upward trends across residents. In the sensitivity analysis using standardized within-examination z-scores, performance continued to increase significantly over time (+0.034 z-units per month, 95% CI 0.024–0.045, p < 0.001), confirming that observed improvement was not attributable to examination difficulty differences alone. Cardiology residents demonstrated a clear and sustained improvement in clinical knowledge performance throughout training. The persistence of longitudinal gains after psychometric adjustment indicates true learning progression rather than artifacts of test difficulty. Integration of psychometric evaluation with mixed-effects modeling provides a robust framework for interpreting routine assessment data and establishes an objective benchmark for longitudinal human knowledge performance.