3.2. Evaluation Metrics
The evaluation of the GRU Model's effectiveness was conducted using a variety of widely recognized evaluation techniques. These included the use of a Confusion Matrix, as well as metrics such as Accuracy, Recall, Precision, and F-score, as referenced in [
42]. The Confusion Matrix, also known as an error matrix, serves as a tool for statistical classification, visually representing the model's performance as shown in Figure 8. This figure illustrates a binary classification scenario, distinguishing between two categories: a positive (P) class and a negative (N) class. The matrix is structured to highlight several key outcomes: True Positive (TP), which indicates accurate predictions of positive instances, meaning the predictions and actual values both are positive. False Positive (FP) refers to instances falsely identified as positive when they are actually negative. True Negative (TN) points to correct predictions of negative instances, where both predicted and actual values are negative. Lastly, False Negative (FN) describes instances where positive values are mistakenly identified as negative[43]. In addition, The Accuracy for the model stands for the ratio among the numbers of corrected predicted samples to the total number of input samples. This is explained by Equation 10.
The Recall represents the amount of right positive results divided by the amount of all relevant samples. This is represented in Equation 11.
The precession metric estimates the amount of accurate positive results divided by the amount of positive results expected through the classifier. This represented in Equation 12.
Finally, the F-score measured by equation 9. This equation illustrations just one score that equilibrium\composed the concerns of recall and precision in single value. F-score makes a balance between two metrics; recall and precision. It is a balanced mean of two different scores, product with 2 to get a score of 1 when both of recall and precision equal to 1.
Figure 11.
General Structure of Confusion Matrix.
Figure 11.
General Structure of Confusion Matrix.
3.2. Results and the Proposed Model Hyperparameters
The GRU Classifier model was developed using an educational dataset for both its training and testing phases. This model incorporated a sequential architecture featuring a max-pooling layer, a dense layer, and utilized the Adam optimizer for enhancing its performance. The chosen loss function for the model was binary_crossentropy, suitable for binary classification tasks. For validating the model's effectiveness, the K-fold cross-validation method was employed, specifically with a single fold (k = 1), effectively creating a straightforward train/test split. The model's architecture included a fully connected neural network (FCNN) layer with 100 neurons and nine input variables, adopting the ReLU activation function for non-linear processing. The design also incorporated two hidden layers: the initial layer being a GRU layer equipped with 256 units and a recurrent dropout of 0.23 to mitigate overfitting, and the subsequent layer, a one-dimensional Global Max-Pooling layer for feature down-sampling. The output layer activated by a sigmoid function, reflects the binary nature of the dataset's classification challenge. The implementation was carried out using Keras and Python, harnessing the Adam optimizer's capabilities with a learning rate set at 0.01 and a momentum of 0.0, aiming for efficient training dynamics. The model's training was configured with a batch size of 90 and planned for 300 epochs, although an early stopping mechanism was introduced after just 7 epochs to prevent overfitting, with a patience setting of 2 epochs. Initially, the model comprised 275,658 parameters, highlighting its complexity and capacity for learning. Regarding the classification task, the model demonstrated a requirement of approximately 16 seconds per epoch, with each epoch involving a random shuffle of the training data to ensure varied exposure. The overarching goal in training this model was to minimize validation loss as measured by binary_crossentropy, indicating a focused effort on enhancing predictive accuracy for student assessments.
Extensive testing and experimentation were conducted to fine-tune the proposed model, involving various configurations and hyper_parameter adjustments to achieve optimal performance. This effort was aimed at accurately predicting student assessments within educational settings. The effectiveness of the model, as detailed in Figure 9, is evidenced by its high accuracy scores for the prediction task. The data presented in Figure 9 highlights the model's capability in accurately forecasting student assessments, particularly noting the significant impact of integrating the GRU layer and a fully connected neural network. Specifically, the model attained an impressive accuracy rate of 99.70%, showcasing its precision in evaluation predictions. The inclusion of a global Max-Pooling layer played a crucial role in bolstering the model's predictive accuracy concerning student evaluations. When compared to existing models documented in the literature, this model demonstrated superior performance. For example, it outpaced an RNN model, which recorded an accuracy rate of 95.34%, a discrepancy attributed to the RNN's challenges with vanishing gradients, as indicated in
Table 2. Additionally, the model showcased enhanced performance over the ARD V.2 and AdaBoost models, which achieved accuracy rates of 93.18% and 94.57% respectively. The successful application of GRU alongside Max-Pooling layers over the neural network layer underscored the model's comprehensive capability and effectiveness in autonomously predicting student assessments. Figure 9 offers a glimpse into the model's experimental evaluation for predicting student performance, while
Table 2 consolidates the advantages offered by the GRU model in this context.
Table 2.
The Summary of the Proposed GRU model.
Table 2.
The Summary of the Proposed GRU model.
Layer (Type) |
Output Shape |
Parameters No. |
Input_1 (inputLayer) |
(None, 10, 1) |
0 |
Word_dense (Dense) |
(None, 10, 100) |
200 |
Gru (GRU) |
(None, 10, 256) |
274944 |
Global_max_pooling (Global MaxpoolingID |
(None, 256) |
0 |
Dense |
(None, 2) |
514 |
Total Parameters: 275,658 |
Trainable Parameters: 275,658 |
Non-Trainable Parameters: 275,658 |
Figure 10 illustrates the prediction model's error rates throughout the simulation process, demonstrating a consistent decrease in error for both the training and actual validation datasets as the learning progressed. This simultaneous reduction in error rates during training indicates that the GRU model effectively avoids the issue of overfitting, showcasing its ability to generalize well to new, unseen data while improving its accuracy on the training data over time.
Figure 10.
relation between model loss (error) with epoch during training and testing the model using the educational dataset.
Figure 10.
relation between model loss (error) with epoch during training and testing the model using the educational dataset.
Additionally, the accuracy of student performance predictions is graphically depicted in Figure 11. This demonstrates that the proposed GRU model has been effectively trained. There is a noticeable increase in accuracy for both the training and testing phases within educational datasets, starting from epoch number 1 and continuing up to epoch number 4. This upward trend in accuracy highlights the GRU model's ability to perform and classify with precision.
Figure 11.
Accuracy Analysis for the proposed GRU model.
Figure 11.
Accuracy Analysis for the proposed GRU model.
Moreover, an additional metric was employed to evaluate the performance of the proposed GRU model, as depicted in Figure 12 through the confusion matrix. This matrix effectively highlights the number of true positives, accurately predicted and correctly classified samples, alongside true negatives, which are correctly identified as belonging to the alternate class in the context of student performance classification. According to Figure 12, the model successfully identified 2885 samples as true positives and 110 samples as true negatives. Conversely, the confusion matrix also reveals instances of incorrect predictions, classified as false positives and false negatives. Specifically, the model incorrectly classified 38 samples as false positives, while no instances were recorded as false negatives. The data presented in Figure 12 underscores the model's high accuracy and proficiency in predicting student assessments, with a minimal error margin.
Figure 11.
Confusion Matrix for the GRU Model.
Figure 11.
Confusion Matrix for the GRU Model.
Furthermore, the GRU model offers deeper analysis and insights into the educational dataset. For example, Figure 13 showcases the model's capability to discern and illustrate the relationship between two critical variables: the internal evaluation grades from the BA/BSc 5th Semester Examination (IN_Sem5) and those from the BA/BSc 6th Semester Examination (IN_Sem6). This demonstrates the model's effectiveness in identifying significant correlations within the educational data. Additionally,
Table 3 presents a comparative analysis of various techniques, including ARD V.2, the RNN model, and AdaBoost, in their ability to classify student performance. It is evident from this comparison that the GRU model outperforms the other methodologies, indicating its superior accuracy and effectiveness in predicting student outcomes.
Figure 11.
the Correlation between the BA/BSc 5th Semester Examination (IN_Sem5) and the internal evaluation Grades obtained in the BA/BSc 6th Semester Examination (IN_Sem6).
Figure 11.
the Correlation between the BA/BSc 5th Semester Examination (IN_Sem5) and the internal evaluation Grades obtained in the BA/BSc 6th Semester Examination (IN_Sem6).
Table 3.
Comparison Between Diverse Classification Approaches.
Table 3.
Comparison Between Diverse Classification Approaches.
The Classifier |
Precision |
Recall |
F-Score |
Accuracy |
RNN Model |
0.96 |
0.99 |
0.98 |
95.34 |
ARD V.2 |
0.926 |
0.932 |
0.939 |
93.18 |
AdaBoost |
0.934 |
0.946 |
0.939 |
94.57 |
The Proposed model |
0.986 |
0.963 |
0.974 |
99.70 |