Section 4 encompasses the execution of experiments and evaluation of results. Initially, the performance measures for the classification algorithm and the experimental results from classifying the network designed in this paper using the spectrum dataset are presented. Subsequently, we discuss the experiment results of CO
2 feature vector-based classifier method and direct input of mid-infrared spectrum into SCNN. Finally, ablation experiments are provided to compare effectiveness of peak features, SCNN model, optimizer selection, learning rate selection, and running time.
4.1. Performance Measures and Experiment Results
Based on the evaluation criteria for data classification tasks, the performance measures of PF-SCNN for aero-engine hot jet includes Accuracy, Precision, Recall, F1-score and Confusion matrix. An instance is classified as a positive class and predicted to be positive, resulting in a True Positive (TP); if it is classified as negative but predicted as positive, it becomes a False Negative (FN); conversely, if the instance is a negative class and predicted to be positive, it results in a False Positive (FP), while predicting negative correctly leads to True Negative (TN). Based on these assumptions, evaluation criteria such as accuracy rate, recall rate, F1 value and confusion matrix are defined:
Accuracy: The accuracy is the ratio of correctly classified samples to the total number of samples.
Precision: The precision is the ratio of correctly predicted positive samples to all predicted positive samples.
Recall: The recall is the ratio of the number of samples correctly predicted to be positive to the number of samples that are actually positive.
F1-score: The F1-score is a composite measure of precision and recall, which considers both aspects to evaluate the overall performance of the model.
where the
stands for
,while the
stands for
.
Confusion Matrix: The confusion matrix presents the classification results of various categories by the classifier, encompassing TP, FP, TN and FN. It visually demonstrates the disparity between actual and predicted values, with the diagonal elements indicating the number of accurate predictions for each category made by the classifier.
Table 4 offers a detailed breakdown of each component in the confusion matrix.
To validate the algorithm’s effectiveness, training and prediction experiments were conducted on a dataset comprising six types of aero-engine spectra. The experiments took place on a Windows 10 workstation equipped with 32 GB of RAM, an Intel Core i7-8750H processor, and a GeForce RTX 2070 graphics card.
Table 5 provides detailed parameters for PF-SCNN:
According to the
Table 5 parameters, we conducted training and label prediction on the data set, and obtained experimental results as shown in
Table 6:
Figure 5.
PF-SCNN spectrum matching classification experiment results, where the left figure is the loss function transformation curve, the right figure is the correct rate transformation curve, the blue line is the training data, orange is the verification data.
Figure 5.
PF-SCNN spectrum matching classification experiment results, where the left figure is the loss function transformation curve, the right figure is the correct rate transformation curve, the blue line is the training data, orange is the verification data.
The PF-SCNN, as designed in this paper, effectively classifies the six types of aero-engine hot jet spectrum dataset with 99.46% accuracy. The model demonstrates high precision (99.77%) and recall (99.56%), accurately predicting both positive classes and actual positive samples. The confusion matrix provides insights into prediction performance for each class. Analysis of the F1 score (99.66%) shows a strong balance between accuracy and recall, while Loss and accuracy converge rapidly within 500 training sessions—taking 2757.8s for training and 71s for label prediction per data.
Despite encountering special cases such as aero-engine failure during experiment, our PF-SCNN demonstrates robustness with minimal impact on overall classification accuracy when handling error data within spectrum data set.
4.2. The Traditional Method Classifies and Compares Experimental Results
The hot jet comprises mixed gases such as oxygen (O
2), nitrogen (N
2), carbon dioxide (CO
2), steam (H
2O), carbon monoxide (CO), among others. To facilitate comparison with classical classifier methods, the main components of the aero-engine hot jet were analyzed, and a CO
2 feature vector was designed. In the experiment, four characteristic peaks in the mid-wave infrared region of the BTS from the aero-engine hot jet were selected to construct the spectrum feature vector. These peaks corresponded to wave numbers 2350cm
-1, 2390cm
-1, 720cm
-1, and 667cm
-1 respectively; their positions are illustrated in
Figure 6:
The peak differences between 2390cm
-1 and 2350cm
-1, as well as between 719cm
-1 and 667cm
-1, form a single spectral feature vector
:
Due to environmental influences, the peak positions of the selected characteristic peaks may shift. The experimentally measured infrared spectrum data’s maximum and minimum peaks at 2350cm
-1, 2390cm
-1, 720cm
-1 and 667cm
-1 are extracted within a specified region to form the spectrum feature vector. The specific threshold range is detailed in
Table 7:
The feature vector needs to be combined with the classifier to test the classification effect. We select the commonly used classifier such as SVM, XGBoost, CatBoost, AdaBoost, Random Forest, LightGBM, Neural Network algorithm to combine with CO2 feature vector for the aero-engine hot jet spectrum classification task.
Table 8 provides the parameter settings of the classifier algorithms:
In order to compare with the deep learning method, we combine training set with validation set, set the training set and the prediction set with the ratio of 9:1, and obtain the experiment results in
Table 9:
Based on the analysis of experimental results for CO2 feature vector classifier methods, it is observed that the overall performance of SVM algorithm in classification is suboptimal. AdaBoost exhibits poor prediction performance with consistently low indices while Neural Networks also underperform. Conversely, XGBoost, CatBoost, Random Forest and LightGBM demonstrate strong classification capabilities with excellent predictive accuracy, positive instance capture, balanced accuracy rate and recall rate. However, they still have some distance from our high-precision recognition. This distance is reflected in the limitation of a single feature vector to the feature representation of data. In complex experimental scenarios, it is not enough to use CO2 as a single spectrum feature to represent the spectrum characteristics, and more features should be explored to describe our spectrum data.
4.3. Ablation Experiment Analysis
(1) Peak Feature Effectiveness: Integrating peak features with traditional classifiers validates their impact on algorithm enhancement for classification. We identified peaks in our data set and obtained experimental results, as depicted in
Figure 7:
In Figure (a), the red points represent the peak positions identified by the peak finding algorithm; in (b), the blue histogram illustrates the distribution of these peak positions within the data set; and in (c), the red line indicates the locations of points with higher frequency in the spectrum data set. Following a comprehensive statistical analysis of these peak positions, a total of 56 peaks were identified. Subsequently, data corresponding to these 56 peak positions from each dataset were computed, and a classification experiment was conducted using SVM and XGBoost classifier methods as part of our previous study, resulting in detailed classification results presented in
Table 10:
Based on the experimental results, the extracted peak data demonstrates significant efficacy. When compared with CO2 feature vector classifiers, all indices exhibit notable enhancements. The experimental findings suggest that feature extraction using the peak finding algorithm is highly effective for classification tasks.
(2) SCNN model
We use the same parameters of the PF-SCNN model to input data in the mid-infrared region into SCNN for training and prediction, and obtain
Table 11:
Figure 8.
SCNN spectrum matching classification experiment results, where the left figure is the loss function transformation curve, the right figure is the correct rate transformation curve, the blue line is the training data, the orange line is the verification data.
Figure 8.
SCNN spectrum matching classification experiment results, where the left figure is the loss function transformation curve, the right figure is the correct rate transformation curve, the blue line is the training data, the orange line is the verification data.
When the entire mid-infrared spectrum was used as input, the training time amounted to 19103.60 seconds. It is evident that the SCNN model can achieve commendable accuracy in both training and prediction. However, owing to data quantity issues, the SCNN model demands substantial computing resources and time.
(3) Optimizer selection:
In deep learning network training, we input commonly used optimizers such as RMSProp, Adam, Nadam, SGD, Adagrad, and Adadelta into the base network model for comparison on the spectrum data used in this paper to determine the most suitable optimizer. The optimizer parameters are shown in
Table 12:
We conducted 200 training tests on the peak data of the spectrum dataset using various optimizers on the base network, resulting in
Table 13:
Figure 9.
Network training and validation of Loss function and Accuarcy variation curves on spectrum data set by different optimizers: RMSProp in blue, Adam in orange, Nadam in green, SGD in red, Adagrad in purple, and Adadelta in brown.
Figure 9.
Network training and validation of Loss function and Accuarcy variation curves on spectrum data set by different optimizers: RMSProp in blue, Adam in orange, Nadam in green, SGD in red, Adagrad in purple, and Adadelta in brown.
As depicted in the graph, both RMSProp and Adam exhibit rapid convergence on the loss function curve and accuracy curve after 200 iterations of model training, leading to high prediction accuracy.
(4) Learning rate selection:
The choice of learning rate is a crucial step in training a neural network. In our experiments, we tried different learning rate values and observed their impact on the model’s performance. Starting from 0.001, we gradually increased the learning rate using a power of 10. In this way, we obtained a series of results and summarized them in
Table 14 for detailed comparison and analysis. This helps us find the optimal learning rate value that best fits the data set and model architecture, thereby improving training efficiency and model performance:
Figure 10.
Network training and validation of RMSProp with different learning rates on spectrum data sets loss function and accuarcy change curve: Among them, blue represents the learning rate of 0.001, orange represents the learning rate of 0.0001, green represents the learning rate of 0.00001, and red represents the learning rate of 0.000001.
Figure 10.
Network training and validation of RMSProp with different learning rates on spectrum data sets loss function and accuarcy change curve: Among them, blue represents the learning rate of 0.001, orange represents the learning rate of 0.0001, green represents the learning rate of 0.00001, and red represents the learning rate of 0.000001.
Based on the variations in the loss function and accuracy curves for a given optimizer under different learning rates, as well as considering the final accuracy value, it is apparent that RMSProp demonstrates optimal performance on the spectrum data set when utilizing a learning rate of 0.0001. It is consistent with the learning rate we have adopted.
(5) Running time:
Deep learning methods necessitate longer model training times due to their typical utilization of large datasets and complex network structures. In contrast to traditional classifier methods, deep learning algorithms require more iterations for parameter adjustment and model optimization in order to achieve higher accuracy and generalization capability. In practical applications, we conducted a detailed comparison of prediction times for different methods on datasets and compiled the results into a test time comparison table
Table 15. These data clearly illustrate the differences in prediction times required by various algorithms when processing the same dataset, providing valuable references for further analysis and evaluation:
The data in
Table 15 indicates that CO
2 feature vector classifiers exhibit fast running times. However, the PF-SCNN does not offer significant advantages in terms of running time, as it requires matching prediction data with trained data during the network’s prediction stage, resulting in substantial consumption of computing time and memory. Although introducing peak values significantly enhances algorithmic speed compared to using only the SCNN model for prediction, the one-to-one matching method still demands considerable computing time. Based on these findings regarding running times, our future research will focus on extracting stable and distinct features from each type of aero engine’s hot jet spectrum to reduce predicted data volume and prediction time.