Search for a command to run...
Emotion recognition stands as a complex and prominent challenge within contemporary artificial intelligence research. Deep learning on physiological signals has boosted emotion recognition, yet unimodal limits, ignored channel importance, and temporal cues hinder feature extraction. Within this investigation, we introduce a multimodal framework for emotion recognition, integrating various attention mechanisms to refine feature extraction from multimodal physiological data, which in turn elevates the precision of emotion detection. Firstly, this paper fully exploits the distributed nature of multi-channel EEG signals by extracting micro-differential-entropy (DE) emotion matrices from both EEG and peripheral physiological signals. A channel-attention mechanism is then introduced to measure the similarity among electrode-channel samples of the physiological signals, yielding sample-importance weights that are subsequently probabilistically redistributed across the channels. With these reweighted signals, depthwise-separable convolutional neural networks and long short-term memory networks are employed to capture their spatial and channel-attention information. Secondly, recognizing that latent emotional information exists between temporal slices of multimodal physiological signals, the paper fuses the extracted features from different modalities into a unified representation. A multi-head attention mechanism is integrated into a recurrent network with ordered neurons to explore the relative importance of temporal sequences across physiological samples, thereby achieving emotion recognition. Finally, the proposed approach is evaluated on two distinct datasets, and experimental results demonstrate its strong generalization capability.