Preprint Article Version 1 This version is not peer-reviewed

Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model

Version 1 : Received: 22 September 2024 / Approved: 23 September 2024 / Online: 24 September 2024 (11:58:48 CEST)

How to cite: Ayub Hajam, U.; Tanzeel Rabani, S.; Khanday, A. M. U. D.; Neshat, M. Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model. Preprints 2024, 2024091876. https://doi.org/10.20944/preprints202409.1876.v1 Ayub Hajam, U.; Tanzeel Rabani, S.; Khanday, A. M. U. D.; Neshat, M. Spoken Kashmiri Recognition with Dual Feature Extraction and Spectrogram Augmentation Using a CNN-gMLP Hybrid Model. Preprints 2024, 2024091876. https://doi.org/10.20944/preprints202409.1876.v1

Abstract

Automatic speech recognition of native languages plays a crucial role in fostering inclusivity and preserving linguistic diversity. The Kashmiri language, an underrepresented Indo-Aryan dialect primarily spoken in the Kashmir Valley, poses substantial challenges for existing AI models due to its phonetic diversity and scant linguistic resources. This study addresses these hurdles by developing a robust spoken Kashmiri recognition system that employs dual feature extraction with spectrogram augmentation and a hybrid Convolutional Neural Network (CNN) and Gated Multi-Layer Perceptron (gMLP) model. Key to this endeavour is the creation of a high-fidelity dataset that captures the phonetic variations across Kashmiri dialects, focusing on twelve specific words. Through the integration of dual feature extraction, spectrogram augmentation, and the innovative hybrid modelling approach, our system attains an impressive 96% accuracy on the test dataset for classifying these twelve spoken words. This research not only enhances the generalization and resilience of spoken Kashmiri recognition systems but also represents a critical step towards advancing technology and safeguarding the Kashmiri language within this underrepresented linguistic domain.

Keywords

Automatic Speech recognition; Hybrid convolutional neural networks; Kashmiri Language; Spectrogram; Classification

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.