Chen, R.; Ghobakhlou, A.; Narayanan, A. Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram. Preprints2024, 2024091632. https://doi.org/10.20944/preprints202409.1632.v1
APA Style
Chen, R., Ghobakhlou, A., & Narayanan, A. (2024). Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram. Preprints. https://doi.org/10.20944/preprints202409.1632.v1
Chicago/Turabian Style
Chen, R., Akbar Ghobakhlou and Ajit Narayanan. 2024 "Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram" Preprints. https://doi.org/10.20944/preprints202409.1632.v1
Abstract
Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial-temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. We propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (chroma, spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the "Magnified 1/4 Size" configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.
Keywords
Spectrograms; Musical Instrument Classification; Audio classification; Audio feature extraction; Music information retrieval; Spectrogram transformation; Residual attention networks; Attention mechanisms; Deep learning for audio
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.