Search for a command to run...
Recently, mental disorders have emerged as one of the major contributors to global healthcare challenges. Deep learning methods based on fMRI and EEG have improved the efficiency and accuracy of detecting certain mental disorders. However, these methods often entail substantial costs for equipment and trained staff. Furthermore, most models are designed for specific mental disorders rather than serving as potential tools for widespread screening. This paper focuses on the emotional expression features of mental disorders and introduces a diagnosis model based on audio-visual. The proposed model incorporates a spatio-temporal (S-T) attention mechanism combined with Convolutional Neural Networks (CNNs) and employs Real-Time Gradient Modulation (RTGM). This model effectively captures audio-visual features while dynamically adjusting the contributions of both modalities during training to optimize performance for two mental disorders. Additionally, we introduce dynamically varying Gaussian noise to prevent potential degradation of generalization ability caused by modulation. The effectiveness and feasibility of the proposed model are validated through comparative analyses of various networks, fusion strategies, and modulation methods across three datasets focused on the diagnosis and analysis of two mental disorders: ADHD and depression. The proposed model demonstrates state-of-the-art performance, achieving over 90% accuracy for ADHD classification and improving depression score estimation on AVEC 2013 and AVEC 2014. • Unified audio-visual DL for scalable, low-cost ADHD and depression pre-screening. • Real-Time Gradient Modulation balances modalities and preserves interpretable cues. • Co-developed with CNTW-NHS; validation shows robust results.
Published in: Biomedical Signal Processing and Control
Volume 120, pp. 110164-110164