5.2. Statistical Feature Engineering
Statistical feature engineering is a critical step in the preprocessing phase of machine learning, especially in tasks involving time-series or sequential data, such as vibration analysis in predictive maintenance. By transforming raw data into meaningful statistical features, we can capture essential characteristics that help in distinguishing between normal and abnormal behavior in systems like centrifugal pumps.
Table 5 shows the time domain statistical features extracted from the vibration dataset.
Before data augmentation, the correlation plot of the extracted statistical features in
Figure 6(a) provides insights into the relationships among the twelve features: maximum value, mean value, minimum value, standard deviation, peak-to-peak, mean amplitude, RMS, waveform factor, pulse indicator, peak index, square root amplitude, and margin indicator. This analysis helps in understanding how these features are interrelated and highlights potential multicollinearity issues. High correlations, particularly those above 0.9, indicate that several features might convey similar information, which can lead to redundancy and decreased model efficiency. Applying a Pearson correlation threshold of 0.9 before augmentation led to the selection of a more streamlined feature set: maximum value, minimum value, waveform indicator, and peak index, as shown in
Figure 6(b). These features were chosen for their lower inter-correlation, ensuring that each provides distinct and valuable information for the model. This selection process helps in reducing the complexity of the model, making it more interpretable and potentially more accurate. DA, through techniques like Gaussian noise addition and signal stretching, plays a significant role in enhancing the dataset. Augmentation increases the diversity of the data, helping to mitigate overfitting by providing the model with more varied examples. After augmentation, the feature set expands to include additional features such as standard deviation and peak-to-peak, which were previously excluded. This change indicates that augmentation can reveal additional meaningful relationships in the data, contributing to a more robust and generalizable model. The significance of data augmentation is thus evident in its ability to enrich the dataset, enabling the selection of a broader and more informative set of features for model training.
The correlation plot of the extracted statistical features in
Figure 7(a) reveals the relationships among the twelve features: maximum value, mean value, minimum value, standard deviation, peak-to-peak, mean amplitude, RMS, waveform factor, pulse indicator, peak index, square root amplitude, and margin indicator. This plot is essential for identifying multicollinearity, where features are highly correlated, potentially leading to redundancy and reduced model performance. High correlations, particularly those above 0.9, indicate that some features provide overlapping information, which can affect the robustness and generalization of the model.
To address this, a Pearson correlation threshold of 0.9 was applied, as shown in
Figure 7(b). After applying this threshold, the selected features—maximum value, mean value, minimum value, standard deviation, peak-to-peak, waveform indicator, and peak factor—were retained. These features demonstrate lower inter-correlation, ensuring that the selected features are more independent and contribute uniquely to the model. This selection process not only simplifies the feature set but also enhances the model’s ability to distinguish between different conditions without being influenced by redundant information. The reduced feature set contributes to a more efficient and interpretable model, improving its predictive performance and reliability.
5.3. Gaussian Noise and Signal Stretching
Gaussian noise (GN) and signal stretching (SS) are data augmentation techniques aimed at improving the robustness and generalization of machine learning models by increasing the diversity of the training dataset. The process involved applying these augmentations to the raw data, calculating the weighted average, and extracting statistical features.
Figure 8 shows the normalized augmented dataset for the three class labels. These features were then selected based on a set threshold before being fed into three machine learning classifiers: Support Vector Classifier (SVC), Random Forest (RF), and Gradient Boosting (GB). These classifiers were chosen for their different approaches to handling data and their potential to demonstrate the impact of data augmentation on model performance. Below, we discuss the results before and after augmentation, focusing on the impact on the actual label predictions across the three models.
Figure 9 shows the confusion matrix plot for the classifier model before and after augmentation.
Before data augmentation, the classifiers demonstrated a commendable performance, as reflected in their high accuracy, precision, recall, and F1 scores. For instance, the SVC model showed strong performance, particularly in precision and recall for the majority class (wear), although it faced challenges with minority classes such as regular and crack. SVC results before augmentation:
Normal: Out of 196 samples, 124 were correctly predicted as usual, 71 were misclassified as crack, and only one misclassified as wear.
Wear: Among 1737 samples, the model performed exceptionally well, correctly predicting 1736 as wear, with just one misclassified as crack.
Crack: Out of 159 crack samples, 74 were correctly identified as crack, but a significant number (85) were misclassified as normal.
These results underscore the critical issue of class imbalance, where the model is biased towards the majority class (wear), resulting in a higher number of misclassifications for the minority classes (normal and crack). Addressing this imbalance is crucial for a more balanced and accurate model performance.
After applying GN and SS, the number of data samples increased significantly, introducing more variability into the training set and helping mitigate some of the class imbalance. This increase in data variability, while leading to a decrease in performance metrics such as accuracy, precision, recall, and F1 score, also led to an increase in true label predictions for the augmented data, particularly for the normal and crack classes. This indicates a promising improvement in the model’s ability to recognize these previously underrepresented classes. SVC results after augmentation:
Normal: The normal class saw a substantial increase in sample size to 549, with 280 correctly predicted as normal, though 248 were misclassified as wear and 21 as crack.
Wear: Out of 1762 wear samples, 1727 were correctly identified, with a slight increase in misclassifications into the normal (23) and crack (12) classes.
Crack: The crack class also benefited from augmentation, increasing to 469 samples. Here, 120 were misclassified as normal, 287 as wear, and 62 correctly identified as crack.
These results indicate that while augmentation led to a slight decrease in overall performance metrics, the increase in true label predictions for normal and crack classes is significant. The improvement in the model’s ability to detect these classes suggests that data augmentation helped address the data imbalance issue, providing a more diverse training set that allowed the classifiers to generalize better to previously underrepresented classes. The augmentation process demonstrates that while traditional metrics like accuracy, precision, and recall might decrease, the true positive rate for minority classes can improve, leading to a more balanced model performance across different classes. This is particularly evident in the confusion matrix results, where the post-augmentation predictions for normal and crack samples increased significantly across all three models. This improvement can be attributed to the augmentation techniques creating more diverse and representative samples, which reduce the model’s bias towards the majority class.
Random forest (RF) results before augmentation:
Normal: Out of 196 normal samples, 165 were correctly classified, but 31 were misclassified as crack.
Wear: The RF model performed excellently in the wear class, correctly classifying 1,736 out of 1,737 samples and misclassifying only one as a crack.
Crack: For crack samples, 133 out of 159 were correctly identified, but 26 were incorrectly labeled as normal.
After Augmentation:
Normal: Post-augmentation, the number of normal samples increased to 549, with 447 correctly predicted. This represents a significant improvement in the true positive rate for normal samples, a key benefit of data augmentation. However, the model now misclassified 59 samples as wear and 43 as crack, introducing more variability in misclassification.
Wear: Among the 1,762 wear samples, 1,662 were correctly identified, showing a slight decline from the pre-augmentation performance. 55 were misclassified as normal, and 45 were classified as cracks.
Crack: For crack samples, the model correctly classified 287 out of 469 samples. However, the increase in misclassifications, particularly into the wear category (132 samples), indicates that while the model’s ability to detect cracks improved, it also became more prone to confusion between similar classes.
The RF model’s performance metrics slightly declined after augmentation, with a noticeable misclassification increase across all classes. However, the model showed a marked improvement in identifying normal samples, which were previously underrepresented. The increase in true positives for the normal class suggests that the augmented data provided more diverse examples for the model to learn from, reducing bias towards the majority class (wear). The trade-off here is an increase in the number of misclassified samples, particularly for the crack class. This may indicate that the augmented data introduced new complexities that the RF model struggled to generalize.
Gradient boosting (GB) results before augmentation:
Normal: Out of 196 normal samples, 165 were correctly classified, with 31 misclassified as crack, similar to RF.
Wear: The GB model performed almost flawlessly for the wear class, correctly classifying 1,736 out of 1,737 samples, with only one misclassification as crack.
Crack: Among the crack samples, 134 out of 159 were correctly classified, with 25 misclassified as normal.
After Augmentation:
Normal: The sample size for normal increased significantly, with 448 out of 549 samples correctly identified. The misclassification rates were 58 as wear and 43 as crack, showing an improvement in identifying normal samples but with similar misclassification patterns as RF.
Wear: The GB model correctly identified 1,668 out of 1,762 wear samples, showing a slight decline in accuracy compared to the pre-augmentation results. This decline underscores the trade-offs involved in improving class representation through data augmentation.
Crack: The model correctly classified 278 out of 469 crack samples. However, misclassifications increased, with 56 labeled as normal and 135 as wear, indicating a similar challenge in distinguishing cracks from other classes.
The GB model, like RF, decreased overall performance metrics after augmentation, but with an improved true positive rate for the normal class. The augmentation led to a better balance in class representation, particularly for normal and crack samples, which were previously underrepresented. However, the model’s ability to accurately distinguish between similar fault types, especially wear and crack, was somewhat compromised. This suggests that while the augmented data helped address the class imbalance, it also introduced additional complexity that the model had difficulty managing, leading to increased misclassifications.
Table 6 shows the performance metrics involving the accuracy, precision, recall, and f1 score. Data augmentation through Gaussian noise and signal stretching significantly impacted the performance of both the RF and GB models. While traditional performance metrics such as accuracy, precision, and recall decreased, the true positive rates for minority classes (normal and crack) improved. This indicates that the models became better at detecting these underrepresented classes, albeit at the cost of increased misclassification in the more complex fault types (wear and crack). The augmentation process addressed the class imbalance, providing the models with a more diverse set of training examples. However, the introduction of more complex variations in the data likely contributed to the increase in misclassifications. This underscores the need to carefully balance augmentation techniques to ensure that while class representation is improved, the data remains distinguishable by the models.