Altmetrics
Downloads
68
Views
81
Comments
0
This version is not peer-reviewed
Submitted:
22 September 2024
Posted:
24 September 2024
You are already at the latest version
Feature Set | Accuracy | Precision | Recall |
---|---|---|---|
MFCC + Partial Mel-Spectrogram (Random Forest) | 0.86 | 0.87 | 0.87 |
MFCC + Partial Mel-Spectrogram (Ridge Classifier) | 0.80 | 0.83 | 0.82 |
MFCC + Partial Mel-Spectrogram (K-Nearest Neighbors) | 0.61 | 0.61 | 0.62 |
MFCC + Partial Mel-Spectrogram (Decision Tree) | 0.77 | 0.77 | 0.78 |
MFCC only (Random Forest) | 0.79 | 0.80 | 0.80 |
Mel-Spectrogram only (Random Forest) | 0.75 | 0.77 | 0.77 |
Parameter | Value |
---|---|
Learning rate | 0.000235 |
Learning rate schedule | Step decay |
Number of gMLP layers | 4 |
Optimizer | Adam |
Dropout rate | 0.378036 |
Epochs | 50 |
Kashmiri (Koshur / کٲشُر | Pronunciation | English | Voice Samples |
---|---|---|---|
آ | āh | Yes | 70 |
اَڈسا. | aḍsā | OK | 70 |
بَند | band | Closed | 70 |
بۄہ | bē | Me | 70 |
خَبَر | khabar | News | 70 |
کیاہ | k’ah | What | 70 |
نَہ | na | No | 70 |
نٔو | nov (m.) | New | 70 |
ٹھیک | ’theek | Well/Okay | 70 |
وارَے | va:ray | Well | 70 |
وچھ | vuch | See | 70 |
یلہٕ | ye:le | Open | 70 |
Metric | Min Value | Max Value |
---|---|---|
Average Duration (s) | 0.574 | 1.536 |
Zero-Crossing Rate | 0.0515 | 0.3160 |
RMS Energy | 0.0156 | 0.1411 |
Metric | Precision | Recall | F1-Score |
---|---|---|---|
adsa | 1.00 | 1.00 | 1.00 |
āh | 0.82 | 1.00 | 0.90 |
nov (m.) | 1.00 | 1.00 | 1.00 |
khabar | 0.93 | 1.00 | 0.97 |
be | 1.00 | 0.86 | 0.92 |
vuch | 1.00 | 0.86 | 0.92 |
na | 1.00 | 1.00 | 1.00 |
band | 0.87 | 0.93 | 0.90 |
va:ray | 1.00 | 1.00 | 1.00 |
k’ah | 1.00 | 0.93 | 0.96 |
ye:le | 1.00 | 1.00 | 1.00 |
theek | 1.00 | 1.00 | 1.00 |
Metric | Precision | Recall | F1-Score |
---|---|---|---|
Macro Avg | 0.97 | 0.96 | 0.96 |
Weighted Avg | 0.97 | 0.96 | 0.96 |
Accuracy | 0.96 |
Model | Architecture | Dataset Used | Accuracy | Precision | Recall | F1-Score | Unique Features |
---|---|---|---|---|---|---|---|
Hybrid CNN-gMLP | CNN + gMLP (4 layers) | Kashmiri Speech Dataset | 96% | 0.97 | 0.96 | 0.96 | Dual feature extraction (MFCC + Mel-Spectrogram), local and global feature capture, robust for phonetic diversity of Kashmiri language. |
CNN + LSTM Hybrid | CNN + LSTM | TIMIT | 91.3% | 0.90 | 0.89 | 0.89 | Local feature extraction via CNN combined with sequential modeling by LSTM, effective in handling phonetic variations in speech data [46]. |
Deep Speech (RNN) | RNN-based end-to-end | Various speech datasets | N/A | N/A | N/A | WER: 10.55% | End-to-end speech recognition, large-scale training with parallelization for scalability, robust handling of continuous speech [9]. |
Wav2Vec (CNN) | CNN for pre-training | LibriSpeech | 95.5% | N/A | N/A | WER: 8.5% | Unsupervised pre-training with contrastive loss to improve robustness on noisy and limited data [58]. |
Listen, Attend, and Spell | Attention-based neural net | Large Vocabulary Dataset | N/A | N/A | N/A | WER: 13.1% | Attention mechanisms for large vocabulary speech recognition, efficient handling of conversational speech [59]. |
Hybrid CNN-RNN Emotion Recognition | CNN + RNN (LSTM/ GRU) | Speech Emotion Dataset | 89.2% | 0.87 | 0.88 | 0.87 | Captures local features and temporal dependencies, well-suited for emotional and contextual speech analysis [48]. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 MDPI (Basel, Switzerland) unless otherwise stated