Neurolens: A Multimodal Real-Time Stress Detection System using Computer Vision and Speech Emotion Recognition

20260 citationsJournal Article

Authors

Chaitali Mhatre · Universal Engineering College

Shivam Pal · Universal Engineering College

Aryan Rajbhar · Universal Engineering College

Pritesh Patra · Universal Engineering College

Yuvraj Rathod · Universal Engineering College

Abstract

Abstract - Chronic psychological stress impairs cognitive performance, academic outcomes, and long-term well-being, yet most automated detection systems rely on a single sensing modality, limiting their robustness under real-world conditions. Unimodal approaches—whether vision-based, physiological, or acoustic—are individually vulnerable to noise, occlusion, and signal artifacts, motivating the need for integrated multimodal frameworks. This paper presents Neurolens, a real-time multimodal stress detection system that concurrently processes facial video through a fine-tuned You Only Look Once version 8 (YOLOv8) model trained on a publicly available facial emotion dataset, wearable physiological signals—including electrodermal activity (EDA), blood volume pulse (BVP), and skin temperature— through a hybrid convolutional neural network–long short- term memory (CNN-LSTM) architecture trained on the WESAD wearable stress and affect detection dataset, and speech audio through a Wav2Vec2 transformer-based speech encoder. A weighted late-fusion module integrates per- modality stress scores into a unified Stress Index rendered on an interactive real-time dashboard with adaptive push notifications and ambient brightness control. System demonstrations confirm correct identification of stress- indicative facial states such as anger and elevated physiological arousal from CSV-uploaded sensor data, alongside neutral baseline detection with appropriately reduced stress index values. These results establish Neurolens as a scalable, non-invasive, and reproducible framework for continuous passive stress monitoring in academic, clinical, and professional environments. Keywords—multimodal fusion; Wav2Vec2; CNN-LSTM; facial emotion recognition; speech emotion recognition; wearable sensors.

Topics & Keywords

Emotion and Mood Recognition Mental Health via Writing Digital Mental Health Interventions

Publication Details

Published in: INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT

Volume 10, Issue 03, pp. 1-9

DOI: 10.55041/ijsrem58725

Field-Weighted Citation Impact: 0.00

Command Palette

Neurolens: A Multimodal Real-Time Stress Detection System using Computer Vision and Speech Emotion Recognition

Authors

Abstract

Topics & Keywords

Publication Details