Search for a command to run...
The Brier score is a widely used metric in epidemiological and clinical research for evaluating the accuracy of probabilistic predictions for binary outcomes, such as disease occurrence, treatment response, and screening performance. Despite its popularity, the Brier score is frequently misunderstood, leading to flawed interpretation of prediction models and potentially misguided public health and clinical decisions. This study aims to didactically clarify common misconceptions about realised Brier scores and to provide practical, statistically rigorous guidance for its correct interpretation in epidemiologic and public health prediction models. We analytically examined its statistical properties and conducted simulation studies across diverse scenarios, varying the distribution of true outcome probabilities, prediction accuracy, sample size, and event prevalence. Five prevalent misconceptions were identified, including the mistaken belief that a Brier score of zero indicates a perfect model. Analytic arguments and simulations demonstrated that even perfectly specified models yield non-zero Brier scores under realistic conditions. The Brier score was shown to reflect not only prediction accuracy but also the underlying distribution of true risks and random variation in outcomes. Comparisons across different populations or disease settings can therefore be misleading, and the Brier score does not directly measure calibration. We recommend restricting comparisons to the same population and complementing the Brier score with calibration metrics and measures of clinical or public health utility. Adopting these practices will improve the validity and interpretability of risk prediction in epidemiologic research and enhance decision-making in population health.