Optimized CNN-based facial analysis for depression detection: Managing mental disorder in education

20260 citationsJournal Articlegold Open Access

Authors

Mengqiu Wang · Chengdu Design & Research Institute of Building Materials Industry (China)

Yanxia Wu · Jilin Vocational College of Industry and Technology

M. Khosravi · Shiraz University of Medical Sciences

Abstract

Major Depressive Disorder (MDD) is frequently under-identified in educational contexts, including arts and design programs, where limited clinical resources and privacy concerns restrict routine screening. Studio-based curricula may introduce critique- and portfolio-driven stressors, motivating low-burden decision support for triage rather than diagnosis. Face-based machine learning can provide a non-intrusive signal when designed to be privacy-preserving and compute-efficient; however, prior work often relies on multimodal inputs, under-reports calibration/robustness, and is not optimized for on-device deployment. We present OCFA (Optimized CNN-Based Facial Analysis), a lightweight face-only pipeline that integrates (1) a RobFaceNet-style backbone with Adapt-Coordinate Attention, (2) multi-objective evolutionary model selection under explicit efficiency and reliability constraints, and (3) post-hoc temperature scaling to improve probabilistic calibration with zero inference-time FLOPs. Because supervision is available at the interview/session level rather than per frame, OCFA aggregates frame-level evidence into a session score (pooling/MIL variants) while enforcing strict subject/session-wise separation to prevent identity leakage. Experiments use the DAIC-WOZ depression subset with PHQ-8-derived labels (binary screening cutoff PHQ-8 ≥ 10) and official train/dev/test partitions; cross-domain generalization is assessed on E-DAIC as an external benchmark with a controlled shift in interview dynamics (WoZ-controlled vs AI-controlled sessions). On the official DAIC-WOZ test split, OCFA achieves 82.98% accuracy, 82.61% F1, AUROC= 0.886, and post-scaling ECE≈ 0.040 at 0.065 GMac (112×112) with 3.80 M parameters. Under the same frozen operating point (no external-set re-tuning), OCFA attains 81.10% accuracy, 80.20% F1, and AUROC= 0.874 on E-DAIC. For privacy-aligned interpretability, we report SHAP-based global feature importance without exposing identifiable face imagery. OCFA is designed as a calibrated, on-device compatible risk estimator for human-in-the-loop screening workflows (including art and design schools). Prospective validation under real educational capture conditions remains necessary before operational deployment.

Topics & Keywords

Emotion and Mood Recognition Mental Health via Writing Artificial Intelligence in Education

UN Sustainable Development Goals

Quality Education

Publication Details

Published in: Brain Research Bulletin

Volume 237, pp. 111829-111829

DOI: 10.1016/j.brainresbull.2026.111829

Field-Weighted Citation Impact: 0.00