Lightweight Deep Neural Networks for Real-Time Speech Enhancement on Edge Devices

Authors

  • Mohamad Bin Abdul Hamid UniversitiKebangsaan Malaysia, Malaysia Author
  • Freddy Soria Robotics and Automation Laboratory Universidad PrivadaBoliviana Cochabamba, Bolivia. Author

DOI:

https://doi.org/10.17051/NJSAP/01.03.01

Keywords:

Speech enhancement, edge computing, lightweight DNN, real-time processing, neural quantization, embedded AI.

Abstract

In this paper we are presenting a new architecture of lightweight deep neural networks (DNN) which is targeted to real-time speech enhancement with resource-constrained devices at the edge of the network. With speech-based systems like smart assistants, attachable hearing aids, and voice-integrated interfaces becoming more and more popular, there has been stronger need of high quality noise reduction that would have minimal compute requirements. Analysis using traditional signal processing techniques fails to do well in non-stationary noise conditions, and although deep learning-based techniques are better in this condition, their computational requirements are frequently incompatible with running on low-power edge devices. To solve this, we will present a hybrid architecture that presents an efficient feature extraction framework using depthwise separable convolutions, a short attention-augmented bidirectional GRU model module to gain temporal modelling and post-trained quantization process to allow memory and inference-level compression. The training data afforded by popular benchmark datasets such as VoiceBank-DEMAND and the DNS Challenge corpus contain a wide variety of noise present in test signals as well as noise types. We benchmark the proposed model to several key objective measures of a model including Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and the Signal-To-Distortion Ratio improvement (SDRi), in addition to practical deployment metrics that include model size, inference latency, and power consumption. Our low power DNN has a PESQ of 3.01 and SDRi of 9.4 dB and a run time factor (RTF < 1.0) on both ARM Cortex-M7 microcontrollers using CMSIS-NN and NVIDIA Jetson Nano for both applications using INT8 quantization via TensorRT. The overall size of the model itself is less than 2 MB and the amount of power required falls within the range of battery-powered devices. In objective tests, intelligibility and clarity improvements were also realized under real world hazardous noise situations. The results presented herein indicate that the suggested method is viable in terms of addressing the mismatch between high-performing speech enhancement and embedded hardwares with a scalable and deployable solution to the vast majority of edge AI audio implementations in noisy acoustic situations.

Downloads

Download data is not yet available.

Additional Files

Published

2025-05-15

Issue

Section

Articles

How to Cite

[1]
Mohamad Bin Abdul Hamid and Freddy Soria , Trans., “Lightweight Deep Neural Networks for Real-Time Speech Enhancement on Edge Devices”, National Journal of Speech and Audio Processing , pp. 1–8, May 2025, doi: 10.17051/NJSAP/01.03.01.