Augmented Reality Audio: Real-Time 3D Sound Localization with Microphone Arrays

Ragaba Mean Bosco; Moti Ranjan Tandi

doi:10.17051/NJSAP/01.03.09

Authors

Ragaba Mean Bosco Information and Communications Technology, National Institute of Statistics of Rwanda, Kigali, Rwand Author
Moti Ranjan Tandi Assistant Professor, Department of CS & IT, Kalinga University, Raipur, India. Author

DOI:

https://doi.org/10.17051/NJSAP/01.03.09

Keywords:

Augmented reality audio, 3D sound localization, microphone arrays, DOA estimation, beamforming, SRP-PHAT, CRNN, real-time spatial audio.

Abstract

There is an emerging trend of applying Augmented Reality (AR) to immersive multisensory experiences, where spatial audio can (and should) be used to provide greater user presence, realism and fidelity of interaction. Sound localization of three-dimensional (3D) spatial sounds with the minimum latency and accuracy is required to harmonic visual and auditory information and create dynamic positioning of a sound source and transmit them into the intricate AR contexts. The paper introduces a real-time 3D sound localization solution integrating wearable technology and small microphone arrays with a hybrid signal processing and deep learning-based pipeline designed to fit wearable augmented reality devices. The suggested algorithm uses Steered Response Power with Phase Transform (SRP-PHAT) robust initial direction-of-arrival (DOA) estimation in a reverberant and noisy environment and a lightweight Convolutional Recurrent Neural Network (CRNN) which improves azimuth and elevation estimation to sub-degree accuracy. The system has a low-latency beamforming system architecture that allows optimal spatial attention, head-tracking compensation with inertial measurement unit (IMU) data to provide spatial coherence with user movement. The proposed approach is the first to balance accuracy, robustness, and real-time responsiveness simultaneously and is thus notable given that traditional localization systems are either plagued with high computational complexity or sub-optimal performance in dynamic acoustic scenarios, and are currently not deployed on mobile AR devices (that have limited processing capabilities). It was experimentally verified on a 32-channel Eigenmike spherical microphone array and also on a 8-element wearable AR headset array, in order to test both high-resolution benchmarking and a real-world deployment scenario. The results show median localization errors of 2.9 degrees in azimuth and 3.8 degrees in elevation and end-to-end processing latency below 25 ms, so meet the requirements of interactive AR audio rendering in terms of perceptual threshold. These results imply that the introduction of high-tech sound localization directly into wearable AR devices, which can support more interactive and perceptually accurate spatial audio environments is possible. It can be used in augmented reality scenarios to provide training, telepresence, assistive hearing devices as well as situational awareness among other potential application areas outside entertainment.

Downloads

Download data is not yet available.

Augmented Reality Audio: Real-Time 3D Sound Localization with Microphone Arrays

Authors

DOI:

Keywords:

Abstract

Downloads

Additional Files

Published

Issue

Section

How to Cite

Latest publications

Information

Language