4.1. Performance of Classifier of Algorithms
Table 1 outlines the performance outcomes achieved by employing various classifiers on both CNN models (VGG19, VGG16, and SqueezeNet) and ML models (SVM, NN, RF, and KNN), employing three distinct optimization techniques: Grid Search, Random Search, and Default Parameters. The algorithms undergo thorough evaluation under different optimization approaches, namely Grid Search, Random Search, and Default Parameters. A closer examination of the performance disparities across diverse optimization strategies reveals noteworthy fluctuations in accuracy across different models and classifiers, as demonstrated in
Figure 9. The comparison of accuracy among ML classifiers yields intriguing insights. Notably, Grid Search yields an accuracy of 0.888 for the NN classifier when applied to the VGG16 model, closely followed by SVM with an accuracy of 0.855. However, the accuracy decreases to 0.843 when default parameters are employed.
For the NN classifier, all optimization techniques yielded remarkable results, consistently achieving accuracies above 0.847 in VGG19. Both SVM and NN classifiers exhibited outstanding precision when applied to the VGG16 model. Specifically, the NN classifier attained the highest accuracy under Grid Search, registering 0.888, followed SVM at 0.855, and Default Parameters at 0.843. The comparison of performance across various optimization methods provides valuable insights into the efficacy of different approaches in fine-tuning the models for improved accuracy and precision. The NN classifier exhibited commendable performance with accuracy values of 0.888, 0.847, and 0.843 under Grid Search, Random Search, and Default Parameters, respectively. In the SqueezeNet model, the NN classifier achieved accuracies of 0.813 using default parameters, 0.823 with random searches, and 0.825. Particularly, the NN classifier consistently showcased outstanding results across all optimization techniques, maintaining accuracies surpassing 0.888. Comparing ML models, the SVM classifier consistently outperformed CNN models, particularly VGG19 and VGG16. Across every model and optimization technique for the AHAWP dataset, the neural network classification consistently outshined other classifiers. The RF and KNN classifiers generally demonstrated lower accuracy levels compared to the SVM and NN classifiers.
4.2. Compared mean accuracy scores between models
For Arabic handwritten recognition on the AHAWP dataset, the NN classifier continuously demonstrated the highest accuracy rates, reaffirming its potential in this domain. Results from the SVM classifier were also promising, particularly when used with the VGG16 model. However, future research should explore areas for improvement, as the RF and KNN classifiers exhibited lower accuracy results. The optimization approach significantly impacted accuracy, with Grid Search and Random Search consistently outperforming Default Parameters. The t-tests conducted in this investigation compared mean accuracy scores of several pairs of models, providing insights into their statistical significance. The calculated p-values indicated the likelihood of observed differences in mean accuracy being genuine, with a lower significance level indicating stronger evidence against the null hypothesis. In comparing the SVM classifier with other classifiers (NN, RF, and KNN) using the VGG16 CNN model, the p-value of 0.861 exceeded the conventional significance threshold of 0.05.
The p-values as shown in
Table 2 for the comparisons between SVM and RF and SVM and KNN are less than 0.05 which shows that the accuracy achieved by the SVM classifier differs significantly the p-values from the t-tests give us important information about the statistical significance of the variations in mean accuracy between the models.
These results add to our comprehension of the related
Figure 10 results allow us to conclude that the SVM classifier performs similarly to the NN classifier, but that when utilizing the VGG16 CNN model, it greatly surpasses the RF and KNN classifiers when it comes to of accuracy. The Tukey's Honestly Significant Difference (HSD) test results yield significant insights regarding the pairwise differences in mean accuracy among the models being examined as shown in
Table 3. Comparing the KNN model to the NN model, we observe a mean difference of 0.1173. The associated p-value is 0.0006, which falls below the significant threshold of 0.05. Consequently, we can conclude that there is a statistically significant difference in mean accuracy between the KNN and NN models. Specifically, the KNN model exhibits a higher mean accuracy compared to the NN model.
However, when comparison the KNN and RF models, we discover a mean difference of 0.09 with a p-value of 0.009519. As the p-value goes above 0.05, we lack enough evidence to support a significant difference in mean accuracy among the KNN and RF models.
Additionally, upon closer examination of the KNN and SVM models, a mean difference of 0.115 with a p-value of 0.0007 becomes evident. Given that the p-value is below the significance threshold of 0.05, it indicates a statistically significant divergence in mean accuracy between the KNN and SVM models. Specifically, the KNN model showcases a notably higher mean accuracy in comparison to the SVM model.
Turning our attention to the contrast between the NN and RF models, a mean difference of -0.1083, denoted by the negative sign, emerges. This suggests that the NN model displays a slightly lower mean accuracy compared to the RF model. The corresponding p-value of 0.0011 further emphasizes a significant dissimilarity in mean accuracy between these two models, with the RF model demonstrating superior performance.
Conversely, the mean difference between the NN and SVM models is -0.0023, a value proximate to zero. The associated p-value of 0.999 implies that the mean accuracy of the NN and SVM models is essentially indistinguishable. As such, any observed fluctuations in accuracy between these models are more likely attributable to chance than a substantive performance variance.
Shifting our focus to the mean variance analysis between the RF and SVM models, a mean variance of 0.106 is identified, accompanied by a p-value of 0.0013. This p-value corroborates a significant discrepancy in mean accuracy. The outcomes of the Tukey's HSD test illuminate the nuances of the dissimilarities among the evaluated models in terms of mean accuracy.
Figure 10 graphically elucidates the superiority of the NN model in mean accuracy, surpassing the other models. However, a discernible discrepancy in mean accuracy between the NN and SVM models is not discerned.
When juxtaposed with the alternative models, the NN model, the top performer, demonstrates enhanced accuracy in accurately classifying air-written letters as shown in
Figure 11. This suggests that the model possesses a greater capacity to adeptly detect and assign appropriate labels to the letters.
Figure 12 illustrates the confusion matrix for the NN models, definitively identifying the NN model as the most proficient contender. The confusion matrix provides insightful revelations into the model's predictions, empowering us to gauge the precision of classifying air-written letters. Through the confusion matrix, we can meticulously assess the model's classification accuracy, identify any errors or zones of uncertainty, and decode the matrix where anticipated letter labels align with the columns, while actual letter labels align with the rows.
4.5. Comparison of the proposed model with Preview Work
Table 5 offers a thorough comparison of the approaches and findings from earlier studies in the field of air writing recognition. In our study, we have suggested a model that combines machine learning (ML) and optical character recognition (OCR) methods to recognize Arabic air writing specially. Even if our model's accuracy might not be as high as some of the earlier research included in the table, it's crucial to consider the particulars of the Arabic script and its difficulties. Our study stands out as the first to concentrate especially on the recognition of Arabic air writing. This demonstrates its substantial impact on the industry. Although an 88% accuracy rate may seem lower in comparison to research focusing on English or other. The accuracy levels presented in the table should be interpreted with caution, as direct comparisons may not be appropriate due to variations in datasets, techniques, and language-specific characteristics. Each study examines a different language and writing system, necessitating unique methodologies and considerations. Despite these differences, our model's effectiveness in identifying Arabic air-written letters is evident through its impressive performance, demonstrating its value within the context of Arabic script.
Our research addresses a significant gap in the field of air writing identification by focusing on Arabic script, which paves the way for future advancements and practical applications. This highlights the importance of devising tailored strategies that address the specific challenges posed by various writing systems and languages. Moreover, our study contributes not only to the knowledge base of Arabic air writing but also establishes itself as the first investigation into Arabic air writing recognition. While previous studies have primarily focused on English or other languages, we recognized the importance of understanding the unique complexities and characters within Arabic script. By exploring this specific context, our study enriches the knowledge of air writing recognition techniques for Arabic. Our model effectively combines Machine Learning (ML) and Optical Character Recognition (OCR) methods to accurately recognize Arabic air-written letters, offering a comprehensive solution that leverages the strengths of both approaches. As pioneers in Arabic air writing recognition, our research lays a solid foundation for future investigations in this area. This opens new avenues for researchers to explore additional methods and techniques that can further enhance recognition accuracy in Arabic air writing. As we move forward, these findings will undoubtedly spur further advancements and innovations in the realm of air writing recognition for Arabic and other languages.