Graph Neural Network-Driven Acoustic Scene Analysis for Environmental and Urban Sound Monitoring

Authors

  • G.C. Kingdone Robotics and Automation Laboratory Universidad Privada Boliviana Cochabamba, Bolivia Author
  • Libson Matharine School of Information, Systems and Modelling, University of Technology Sydney, Ultimo, NSW 2007, Australia Author

DOI:

https://doi.org/10.17051/NJSAP/01.02.04

Keywords:

Graph Neural Network (GNN), Acoustic Scene Classification, Environmental Sound Recognition, Urban Noise Monitoring, Spectrogram Graph Representation, Deep Learning, Audio Signal Processing, Smart City, Environmental Acoustics, Urban Sound Dataset.

Abstract

Urban populations and industrial activities grow rapidly and result in increasingly dynamic and complex acoustic environments, which become a notable hindrance to effective environmental sound monitoring activities. Soundscape analysis is important in helping to characterize soundscapes, and these analyses have been used to detect, categorize and make findings on various noise pollution scenarios, emergency response, transportation, and intelligent city development. Conventional deep learning algorithms, especially convolutional neural networks (CNNs) and recurrent network architectures have had significant successes in acoustic scene classification (ASC) but are intrinsically unable to learn due to the inability to learn non-local dependencies and relational structures contained in the audio features. To overcome these constraints, this paper introduces a Graph Neural Network (GNN) based acoustic scene analysis system that present time--frequency representation of audio signals as graph-structured information where each node represents a segment of spectrum or time and where the edges describe the relationship defined by similarity between the nodes. This suggested GNN structure combines spectral graph convolution networks along with attention-based pooling to attain efficient capture of local, as well as global contextual reliances, within city-based sounds. We perform intensive training on UrbanSound8K and SONYC-UST datasets and include powerful data augmentation techniques allowance, including SpecAugment, Mixup, and adding noise, to allow generalization under low signal-to-noise ratio (SNR) settings. Evaluation vis-a-vis powerful baselines, comprising of VGG-like CNNs and CRNNs, along with spectrogram transformers, shows that our GNN-based technique attains an absolute accuracy advantage of as much as 4.8%, and a steady elevation in F1-score and AUC measures. This framework is highly resistant to environmental noise, scalable to handle large volumes of data as well as adaptable to real-time deployments with IoT-enables applications hence a strong candidate in next generation urban acoustic monitoring environments. Potential future extensions involve multi-modal sensor fusion and self-supervised pretraining with the additional goal of generating more performance in the lower-resource context.

Downloads

Download data is not yet available.

Additional Files

Published

2025-02-10

Issue

Section

Articles

How to Cite

[1]
G.C. Kingdone and Libson Matharine , Trans., “Graph Neural Network-Driven Acoustic Scene Analysis for Environmental and Urban Sound Monitoring”, National Journal of Speech and Audio Processing , pp. 27–33, Feb. 2025, doi: 10.17051/NJSAP/01.02.04.