Non-Intrusive Deep Learning-Based Speech Quality Monitoring for Real-Time Telemedicine Systems
DOI:
https://doi.org/10.17051/NJSAP/01.02.10Keywords:
Telemedicine, Speech Quality Monitoring, Non-Intrusive Assessment, Deep Learning, Mean Opinion Score (MOS), Real-Time Audio Processing, Transformer Models, Audio Quality Evaluation, WebRTC, Healthcare CommunicationAbstract
The quality of speech is also significant in the field of telemedicine consultations, and in cases of poor audio quality, it provokes misdiagnoses and patient dissatisfaction, and decreases the efficiency of the process. Conventional intrusive assessment techniques like the PESQ and POLQA use a reference signal, and hence unsuitable in real time telemedicine applications. The paper offers a non-invasive deep learning-based framework of real-time speech quality monitoring within the framework of the telemedicine system. Proposed model uses timefrequency characteristics and transformer-based embeddings to actually predict the Mean Opinion Scores (MOS) without using the clear reference. The low-latency streaming system is designed with minimal delay while being embedded onto a telemedicine platform with WebRTC, allowing continuous feedback to a healthcare provider with respect to the quality. Experiments on the public speech corpus, as well as on telemedicine recordings in a specific domain, prove that the given model yields Pearson correlation of 0.91 with subjective MOS scores whereas the inference latency stays within sub-50 ms on edge infrastructure. The comparative assessments reveal that it has differing accuracy and responsiveness advances to those of the current non-intrusive measures, and makes it suitable in real-time applications of healthcare communication systems.