Transformer-Based Architectures for Robust Speech Recognition and Natural Language Understanding in Noisy and Multilingual Environments

M. Mejaila; Tasil Leyene

doi:10.17051/NJSAP/01.04.05

Authors

M. Mejaila Centro de Investigacion y Desarrollo de Tecnologias Aeronauticas (CITeA) Fuerza Aerea Argentina Las Higueras, Cordoba, Argentina. Author
Tasil Leyene Electrical and Computer Engineering Addis Ababa University Addis Ababa, Ethiopia Author

DOI:

https://doi.org/10.17051/NJSAP/01.04.05

Keywords:

Transformer, Speech Recognition, Natural Language Understanding, Multilingual, Noise Robustness, Conformer, Self-Supervised Learning.

Abstract

Transformer architectures have already boosted the automatic speech recognition (ASR) and natural language understanding (NLU) fields, and this has resulted in the state-of-the-art performance in capturing various languages and difficult acoustic conditions. The following paper examines how the design and use of transformer variants-Conformer models as well as self-supervised models, i.e, wav2vec 2.0 were modified and used in a specific setting of robust speech processing in noisy and multilingual setups. Our configuration integrates data augmentation with domain adaptation, and together with cross-linguistic learning, focus on boosting the generalization and robustness of the model towards noise. The experiments carried out using benchmark multilingual speech corpora and noisy datasets in the real world reveal that transformer based models perform significantly better than conventional recurrent neural network and convolutional neural networks yielding lower word error rates (WER) and higher semantic accuracy. The findings indicate the success of self-attention mechanisms and convolutional augmentations in the ability to capture both the far and local relationships in a signalled speech. Lastly, the paper presents important issues and areas of future research, such as the creation of low latency inference techniques, model compression techniques toward implementing models on the edge, and ethical concerns related to multilingual speech and language applications. This rich through study can be of great help in promoting efficient and high quality, supportive and scalable transformer based speech and language systems that can be adapted appropriately into real life contexts.

Downloads

Download data is not yet available.

Transformer-Based Architectures for Robust Speech Recognition and Natural Language Understanding in Noisy and Multilingual Environments

Authors

DOI:

Keywords:

Abstract

Downloads

Additional Files

Published

Issue

Section

How to Cite

Latest publications

Information

Language