Search for a command to run...
Fire detection is a critical task for early warning systems, particularly in environments where visual sensing is unreliable. While most existing approaches rely on image-based or smoke-based detection, acoustic signals provide complementary information capable of capturing early combustion-related events. This study investigates deep learning models for sound-based fire detection, focusing on convolutional and Transformer-based architectures. VGG16 and VGG19 convolutional neural networks are adapted to process time-frequency audio representations for binary classification into Fire and No-Fire classes. An Audio Spectrogram Transformer (AST) is further employed to model long-range temporal dependencies in acoustic data. Finally, a hybrid VGG19-AST architecture is proposed, in which convolutional layers extract local spectral–temporal features, and Transformer-based self-attention performs global sequence modeling. The models are evaluated on a curated dataset containing fire sounds and diverse environmental background noises under multiple noise conditions. Experimental results demonstrate competitive performance across convolutional and Transformer-based models, while the proposed hybrid VGG19-AST architecture achieves the most consistent overall results. The findings suggest that integrating convolutional feature extraction with self-attention-based global modeling enhances robustness under complex acoustic variability. The proposed hybrid framework provides a scalable and cost-effective solution for sound-based fire detection, particularly in scenarios where visual monitoring may be obstructed or ineffective.