Data-Efficient Audio Event Detection with Few-Shot Learning Paradigms

Charpe Prasanjeet Prabhakar; Gaurav Tamrakar

doi:10.17051/NJSAP/01.03.07

Authors

Charpe Prasanjeet Prabhakar Department Of Electrical And Electronics Engineering, Kalinga University, Raipur, India Author
Gaurav Tamrakar Assistant Professor, Department of Mechanical, Kalinga University, Raipur, India. Author

DOI:

https://doi.org/10.17051/NJSAP/01.03.07

Keywords:

Audio Event Detection (AED); Few-Shot Learning (FSL); Prototypical Networks; Data-Efficient Learning; Log-Mel Spectrogram; Metric-Based Learning; Self-Supervised Audio; Environmental Sound Classification; Low-Resource Audio Recognition; Meta-Learning.

Abstract

Audio Event Detection (AED) is a critical element of any intelligent system introduced into various applications, e.g., in service of public surveillance, smart home automation as well as assistive technologies. Conventional AED systems are mostly dependent on supervised deep learning models with demanding amounts of labeled data to train, which is typically infeasible because it is time-consuming and laborious in annotating audio objects since almost all occasions require labour-intensive and time-consuming attempts to annotate sounds. To deal with this problem, this paper presents a new data efficient AED framework based on the Few-Shot Learning paradigms (FSL) that allows effective detection with a minimum amount of annotated data. The proposed system utilises a metric-based methodology based on convolutional prototypical networks trained via episodic learning, so it can learn a generalised embedding space based on a small amount of data. It also uses data augmentation to make the training more variable and robust, such as by shifting the pitch, injecting noise, and stretch-time, as well as transfer learning, initialising the model with expert semantic prior knowledge with audio feature extractor networks that have been pre-trained. In a bid to guarantee flexibility and scalability, optimization methods-based FSL approaches are investigated in order to calibrate the model to new classes using minimal gradient updates. The model is tested on two popular benchmark datasets, ESC-50 and UrbanSound8K, on two different classification disciplines, 1-shot and 5-shot classification intervals. Experiments demonstrate that the proposed technique far out-scores established AED baselines and recent meta-learning models, and is able to achieve >74% accuracy on 5-shot scenarios, marking a significant improvement over state-of-the-art baselines with fewer than 20 examples per class. The t-SNE visualizations show evident class-wise separation in the embedding space, which proves the model can discriminate between a wide varieties of audio events. This paper demonstrates how FSL can minimize dependence on the data in AED tasks and, therefore, implement reliable, adaptive audio recognition systems in real-world low-resource settings. The given methodology would be formulating a framework locating scope to scalable AED solutions in recognition of rare, novel, and underrepresented audio incderindlings given limited labeled information.

Downloads

Download data is not yet available.

Data-Efficient Audio Event Detection with Few-Shot Learning Paradigms

Authors

DOI:

Keywords:

Abstract

Downloads

Additional Files

Published

Issue

Section

How to Cite

Latest publications

Information

Language