Evaluating Depression Models with Low and High Self-Report Label Noise<sup>*</sup>

20241 citationsJournal Article

Authors

Abstract

Machine learning models for mental health often rely on self-reported data, which can be highly variable in quality. If performance of a system is measured against professionally assigned labels, noise in the self-reported labels will cause forced errors that are not part of evaluation metrics today. This study addresses the challenges associated with self-reporting label noise by evaluating depression prediction models under different self-reporting noise scenarios using probabilistic performance bounds. We introduce two self-reporting noise models—one representing low noise conditions derived from controlled test-retest settings, and another simulating high noise conditions by incorporating lower test-retest reliability. This approach allows us to assess how different levels of label noise affect model evaluation.The study utilizes multiple speech datasets labeled with PHQ-8 scores, collected through different methods, such as human-to-device and human-to-human interactions, across varied demographic groups. The models integrate natural language processing (NLP) and acoustic features to predict depression severity. Performance evaluation employs both regression and classification metrics, with probabilistic bounds estimating the best and worst-case scenarios influenced by dataset characteristics and label noise. We explore how these bounds shift under different noise levels and across subgroups defined by age and gender, highlighting the need for fair and consistent performance in diverse populations. This work is accompanied by a Python library developed to facilitate this evaluation framework, offering tools for estimating performance bounds and visualizing results.In the talk, I will discuss the implications of using noise models for model evaluation, the effect of label noise on predictive accuracy, and methods for assessing model fairness. This work underscores the critical importance of accounting for label reliability in mental health applications and provides a path forward for improving the robustness and fairness of machine learning models in this field.

Topics & Keywords

Mental Health Research Topics

Publication Details

DOI: 10.1109/spmb62441.2024.10842254

Field-Weighted Citation Impact: 0.50

Command Palette

Evaluating Depression Models with Low and High Self-Report Label Noise<sup>*</sup>

Authors

Abstract

Topics & Keywords

Publication Details