Explainable Artificial Intelligence in Speech and Audio Processing: Enhancing Interpretability, Fostering Trust, and Addressing Deployment Challenges

Daniel Lalit; Jinfe Regash

doi:10.17051/NJSAP/01.03.06

Authors

Daniel Lalit Department of ECE and CpE, Ateneo de Naga University, Naga City, Bicol Region, Philippines Author
Jinfe Regash Electrical and Computer Engineering Addis Ababa University Addis Ababa, Ethiopia Author

DOI:

https://doi.org/10.17051/NJSAP/01.03.06

Keywords:

Explainable AI, Speech Recognition, Audio Processing, Interpretability, Trust, Model Explainability, Deployment Challenges.

Abstract

The speed at which deep learning has found an application in speech and audio processing has demonstrated significant performance improvements across several applications such as the automatic speech recognition (ASR), speaker verification, emotion recognition, and audio event detection. Nevertheless, the black boxed, opaque nature of state of the art training poses some serious problems of transparency, interpretability, and trust of users, especially in safety critical and privacy sensitive areas. Explainable Artificial Intelligence (XAI) is a way forward in mitigating such problems as it offers inroads into the inner workings of AI systems. This paper works as a review of the existing XAI techniques applied to speech and audio processing and classifies the methods into model-specific and model-agnostic, explaining the most acceptable metrics of their interpretability. We investigate how explainability can increase trustworthiness, promote regulatory compliance, assist with debugging the systems and reduce bias. The issues of deployment, including real-time interpretability in the face of computational constraints, cross-lingual robustness, and human- machine communication, are critically assessed. We also find gaps in the research which can be explored further such as low-latency explanation generation, multimodal explainability and privacy-preserving explanation mechanisms. Lastly, we lay out a roadmap to the inclusion of XAI in next-generation speech and audio systems and ways to promote the deployment of responsible, transparent, and trusted AI in both business and mission-critical scenarios. Three case studies show that XAI is effective in detecting bias, confirming feature explanations and resolving errors in ASR, emotion recognition, and audio event classification.

Downloads

Download data is not yet available.

Explainable Artificial Intelligence in Speech and Audio Processing: Enhancing Interpretability, Fostering Trust, and Addressing Deployment Challenges

Authors

DOI:

Keywords:

Abstract

Downloads

Additional Files

Published

Issue

Section

How to Cite

Latest publications

Information

Language