Non-Intrusive Deep Learning-Based Speech Quality Monitoring for Real-Time Telemedicine Systems

Authors

  • K.N. Kantor Departamento de Engenharia Elétrica, Universidade Federal de Pernambuco - UFPE Recife, Brazil Author
  • K.P. Sikalu Electrical and Electronic Engineering Department, University of Ibadan Ibadan, Nigeria Author

DOI:

https://doi.org/10.17051/NJSAP/01.02.10

Keywords:

Telemedicine, Speech Quality Monitoring, Non-Intrusive Assessment, Deep Learning, Mean Opinion Score (MOS), Real-Time Audio Processing, Transformer Models, Audio Quality Evaluation, WebRTC, Healthcare Communication

Abstract

The quality of speech is also significant in the field of telemedicine consultations, and in cases of poor audio quality, it provokes misdiagnoses and patient dissatisfaction, and decreases the efficiency of the process. Conventional intrusive assessment techniques like the PESQ and POLQA use a reference signal, and hence unsuitable in real time telemedicine applications. The paper offers a non-invasive deep learning-based framework of real-time speech quality monitoring within the framework of the telemedicine system. Proposed model uses timefrequency characteristics and transformer-based embeddings to actually predict the Mean Opinion Scores (MOS) without using the clear reference. The low-latency streaming system is designed with minimal delay while being embedded onto a telemedicine platform with WebRTC, allowing continuous feedback to a healthcare provider with respect to the quality. Experiments on the public speech corpus, as well as on telemedicine recordings in a specific domain, prove that the given model yields Pearson correlation of 0.91 with subjective MOS scores whereas the inference latency stays within sub-50 ms on edge infrastructure. The comparative assessments reveal that it has differing accuracy and responsiveness advances to those of the current non-intrusive measures, and makes it suitable in real-time applications of healthcare communication systems.

Downloads

Download data is not yet available.

Additional Files

Published

2024-12-10

Issue

Section

Articles

How to Cite

[1]
K.N. Kantor and K.P. Sikalu , Trans., “Non-Intrusive Deep Learning-Based Speech Quality Monitoring for Real-Time Telemedicine Systems”, National Journal of Speech and Audio Processing , pp. 74–81, Dec. 2024, doi: 10.17051/NJSAP/01.02.10.