Hybrid Deep Learning Framework for Acoustic Scene Classification and Environmental Sound Analysis
DOI:
https://doi.org/10.17051/NJSAP/01.02.08Keywords:
Acoustic Scene Classification, Environmental Sound Analysis, Hybrid Deep Learning, CNN, BiLSTM, Attention Mechanism.Abstract
One of the foundational building blocks of smart audio sensing systems is to automatically recognize and classify acoustic scene, represented by Acoustic Scene Classification (ASC) and Environmental Sound Analysis (ESA) and to facilitate applications of smart surveillance, context-aware computing, and autonomous environmental monitoring. Generalization of the traditional machine learning approaches that work on pre-designed spectral and temporal features has demonstrated moderate success and fail to generalize in heterogeneous and noisy real world situations. A Hybrid Deep Learning Framework proposed in this paper combines the use of Convolutional Neural Networks (CNNs) in extracting spatial features and the use of BiLSTM networks in modeling the temporal sequence. The model performs sequential queries on log-mel spectrograms and has an attention based on prioritization of important acoustic patterns and thus aims to gain discriminative power. Investigations on two benchmark datasets TUT Urban Acoustic Scenes 2018 and ESC-50 show that the proposed method outperforms CNN and LSTM baseline architectures in the classification accuracy and obtains the result of 89.6% for ASC and 88.3% for ESA. Further robustness testing using a variety of signal-to-noise ratios verifies the model cannot be easily and reliably distorted by environmental noise, although performance is slightly compromised as low SNR environments are used. These outcomes reflect the functionality of the framework when being applied to practical deployments requiring superb accuracy and noise resilience. The suggested solution is scalable and generalizable to the task of acoustic signal understanding, and in the future may integrate it into multimodal sensing systems and edge AI applications.