Search for a command to run...
Background: Hypotension prediction has attracted considerable attention in the medical community, leading to numerous publications on the topic. Several data-driven models have been proposed, but framing, data selection, and evaluation metrics differ widely in the literature. Methods: Using datasets from non-cardiac and cardiac surgery and a forward framing, we assess how data selection affects model performance. We compare models trained and tested with or without segments containing ongoing hypotension at prediction time or interventions that could affect the classification of those hemodynamics segments. Model performances are evaluated through area under precision-recall curve (AUPRC), under receiver operator characteristic curve (AUROC), and a dedicated metric that better reflects the clinician questions. Results: The non-cardiac cohort contained 1,017 patients and the cardiac cohort 563. Across both datasets, model performance depended strongly on whether ongoing hypotension or classification-altering interventions were present in the evaluation data. For training, removing classification-altering interventions in the training data improved AUPRC (mean difference of 0.01 (95% CI, 0.007 to 0.012, bootstrap p<0.01)), while exclusion of ongoing hypotension did not change the AUPRC (mean difference of 0.000 (95% CI, -0.003 to 0.004)). In the cardiac set, which is only used for evaluation, filtering classification-altering interventions increased on average by 15.5% the AUPRC (0.54 (95% CI, 0.53 to 0.55) vs. 0.47 (95% CI, 0.45 to 0.48)) of the trained models considered. At the same time, including ongoing hypotension in evaluation data increased on average by 72.2% the AUPRC (0.47 (95% CI, 0.45 to 0.48) vs. 0.80 (95% CI, 0.76 to 0.84)). Conclusion: Data selection is critical when building and evaluating hypotension prediction models. For an evaluation that corresponds to the clinical requirement of a hypotension early warning, we recommend training models on datasets excluding classification-altering interventions, and testing on datasets excluding classification-altering interventions and ongoing hypotension.