Version 1
: Received: 22 September 2024 / Approved: 23 September 2024 / Online: 24 September 2024 (11:58:48 CEST)
How to cite:
Ayub Hajam, U.; Tanzeel Rabani, S.; Khanday, A. M. U. D.; Neshat, M. Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model. Preprints2024, 2024091876. https://doi.org/10.20944/preprints202409.1876.v1
Ayub Hajam, U.; Tanzeel Rabani, S.; Khanday, A. M. U. D.; Neshat, M. Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model. Preprints 2024, 2024091876. https://doi.org/10.20944/preprints202409.1876.v1
Ayub Hajam, U.; Tanzeel Rabani, S.; Khanday, A. M. U. D.; Neshat, M. Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model. Preprints2024, 2024091876. https://doi.org/10.20944/preprints202409.1876.v1
APA Style
Ayub Hajam, U., Tanzeel Rabani, S., Khanday, A. M. U. D., & Neshat, M. (2024). Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model. Preprints. https://doi.org/10.20944/preprints202409.1876.v1
Chicago/Turabian Style
Ayub Hajam, U., Akib Mohi Ud Din Khanday and Mehdi Neshat. 2024 "Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model" Preprints. https://doi.org/10.20944/preprints202409.1876.v1
Abstract
Automatic speech recognition of native languages plays a crucial role in fostering inclusivity and preserving linguistic diversity. The Kashmiri language, an underrepresented Indo-Aryan dialect primarily spoken in the Kashmir Valley, poses substantial challenges for existing AI models due to its phonetic diversity and scant linguistic resources. This study addresses these hurdles by developing a robust spoken Kashmiri recognition system that employs dual feature extraction with spectrogram augmentation and a hybrid Convolutional Neural Network (CNN) and Gated Multi-Layer Perceptron (gMLP) model. Key to this endeavour is the creation of a high-fidelity dataset that captures the phonetic variations across Kashmiri dialects, focusing on twelve specific words. Through the integration of dual feature extraction, spectrogram augmentation, and the innovative hybrid modelling approach, our system attains an impressive 96% accuracy on the test dataset for classifying these twelve spoken words. This research not only enhances the generalization and resilience of spoken Kashmiri recognition systems but also represents a critical step towards advancing technology and safeguarding the Kashmiri language within this underrepresented linguistic domain.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.