Robust Audio Signal Enhancement Using Hybrid Spectral-Temporal Deep Learning Models in Noisy Environments

Authors

  • S.Poornimadarshini Jr Researcher, National Institute of STEM Research, India Author

Keywords:

Audio signal enhancement, deep learning, CNN, Bi-GRU, spectral-temporal modeling, speech quality, noisy environments

Abstract

Improvement of the audio signal is highly essential in many applications starting with telecommunication and up to assisting hearing devices in difficult noisy settings. In this paper, we propose a hybrid spectral-temporal deep learning model combining the convolutional and recurrent neural network model for enhancing robust audio signals. Spectral representations of the audio (log-magnitude spectrograms) and temporal dependencies were used in the model via bidirectional gated recurrent units (Bi-GRU). A multi-stage architecture is selected wherein the CNN effects spatial features and the Bi-GRU the temporal continuity. Utilized over databases including VoiceBank-DEMAND and TIMIT with artificially corrupted noises at different SNR levels (0 dB, 5 dB, 10 dB), the proposed model has been proven to dramatically increase the quality of the signal and gains PESQ up to 3.21 and STOI increments up to 0.26 over classical & modern deep models. This shows that hybrid deep learning works in real-world noisy set-ups.

Downloads

Published

2025-03-29

Issue

Section

Articles