Algorithms incorporating artificial intelligence, commonly known as intelligent algorithms, offer highly adaptable and robust tools that mitigate the need for extensive fundamental knowledge and experience in condition monitoring, rendering them desirable for many operators [
8]. According to Liu et al. [
8], fault diagnostics of rotating machinery primarily involves pattern recognition, a task for which AI is particularly well-suited. The intelligent algorithms discussed in this section for the condition monitoring of rotational machinery are categorised as either machine learning classifiers or metaheuristic optimisation techniques.
5.1. Machine Learning Classifiers
AI has found extensive application in the classification of REB faults. Following the extraction of features using suitable signal processing techniques, a classifier is employed for the automatic identification of various machine conditions, eliminating the requirement for an experienced technician. Machine learning, as a key component of AI, refers to the specific approach of training algorithms to learn patterns and make predictions or decisions from data without being explicitly programmed for each task, providing the ability for AI systems to learn and improve from experience without human intervention [
92].
There are three main types of machine learning algorithms, namely supervised, unsupervised and reinforcement learning algorithms. Supervised learning utilises collected data along with their correct class labels to train the algorithm to distinguish between classes from new data [
93]. Unsupervised learning clusters data based on patterns discovered and is typically used to uncover previously unknown information without explicit guidance [
94]. Reinforcement learning involves learning the behaviour necessary to perform optimally in a dynamic environment [
95]. In the context of condition monitoring, the identification of different conditions in data primarily involves supervised learning methods.
ANNs, or artificial neural networks, are predominantly utilised in supervised learning applications. ANNs consist of multiple interconnected nodes arranged into three layers: input, hidden, and output, as depicted in
Figure 4. The nodes in the input layer relay information from sensors to the hidden layers and do not perform computation [
96]. Nodes that do not deal with the input or output of data belong to the hidden layer. ANNs can have multiple hidden layers, where computation occurs. The output layer comprises nodes that convey the computational results from the ANN as output. The input value of each node is multiplied by the connection, which adjusts the input’s impact in the algorithm.
Various ANN models have been prominently employed in the fault diagnosis of REBs. Jia et al. [
98] proposed a method for the automated design of feature extraction algorithms in the bearing fault diagnosis using a four-layer local connection network. Vibration signals from the input layer were analysed by a normalised sparse autoencoder to learn useful features in the local layer. From this, shift-invariant features were identified in the feature layer, allowing the algorithm to differentiate between various conditions in the output layer. Chen et al. [
99] recognised that feature extraction can be time-consuming and require a deep understanding of signal processing. They presented a method where fault diagnosis of bearings was achieved with a multi-scale CNN and long short-term memory (LSTM) model. Two CNNs were used for the automatic extraction of features from raw vibration signals. A stacked LSTM network was then used for classifying bearing health conditions. Ali et al. [
2] used a four-layer ANN for the classification of bearing defects. Features were extracted from the time domain, and the EMD method was also employed. Effective IMFs for bearing fault diagnosis were selected using a statistic criterion. The selected features were then used to train the ANN for fault classification.
SVM is another supervised learning technique commonly used in classification problems. In SVM, data is segregated in a multidimensional space using a hyperplane to classify different machine conditions [
100]. The SVM aims to maximise the distance between the hyperplane and the support vectors of each class to find the best possible solution. These support vectors are the data points nearest to the hyperplane, which influence its orientation and position to effectively separate data classes. It must be noted that not all data can be linearly separated. In such cases, the data is to be mapped to a higher dimension where linear separation of the support vectors is possible.
Figure 5 shows an example of two classes, circles and crosses, being separated with an optimal hyperplane. The support vectors that define the maximum margin of the two groups are indicated with squares.
Although SVM was initially developed for binary classification problems [
101], it has since been adapted to handle multi-class classification tasks using approaches like one-versus-one or one-versus-rest [
102]. This adaptability makes SVM suitable for fault diagnosis in rotating machinery, where multiple health conditions are common. For instance, Yang et al. [
103] applied SVM to diagnose bearing faults using vibration signals. They utilized both fractal dimensions and statistical features extracted from the data for SVM training. This method achieved better classification performance compared to using only fractal dimensions or statistical features. Wang et al. [
104] also used SVM for bearing fault diagnosis. They first extracted features from accelerometer signals using generalised refined composite multiscale sample entropy. Then, dimensionality reduction was performed on the feature set using the supervised isometric mapping algorithm. Finally, the reduced feature set was used with an optimized SVM for bearing health classification.
SVM has also been adapted for the detection of anomalies by training the model on a single class that is considered normal. The process is termed one-class support vector machine (OCSVM) and is achieved by maximising the margin between the single data class and the origin in a higher dimensional feature space [
105]. This method has been used by some researchers such as Fernández-Francos et al. [
106] in the context of bearing fault diagnosis. Vibration signals from bearings operating under normal conditions were used to train an OCSVM for the identification of faulty bearings. Subsequently, fault type identification was achieved by employing envelope spectrum analysis. Kannan et al. [
107] presented a novel information fusion approach to efficiently utilise homogeneous and heterogeneous sensor signals in bearing condition monitoring. OCSVM was employed to extract features corresponding to signal integrity issues, thus an integrity score can be dynamically assigned to data depending on its perceived signal quality. Decision-level fusion was accomplished through a majority voting system using the integrity scores derived and the separate classification results. It was demonstrated that a more reliable classification prediction was achieved using this approach.
Decision trees or classification trees are quite simple in comparison to other supervised learning algorithms. They are models which represent the possible outcomes of a test in a tree-like structure and classifies records based on their likelihood of belonging to a certain class [
108]. The root node representing the whole population is split into multiple sub-nodes which can be categorised as either terminal or nonterminal nodes. Nonterminal nodes are nodes which are further split into sub-nodes representing the outcome of the decision for which the node is responsible and terminal nodes are nodes that do not split. A schematic representation of a decision tree is shown in
Figure 6.
Decision trees are considered as weak learners and because of this, it is common to use them as part of an ensemble classification. An ensemble of decision trees is called a random forest (RF) where multiple decision trees are used to make predictions independently of one another [
109]. These classifications are then combined through a voting procedure to, ideally, increase the predictive accuracy in comparison to a single decision tree [
110]. Cerrada et al. [
111] monitored vibrational behaviour for the diagnosis of faults in spur gears using concepts of RF with a GA. Various features were first extracted from the time and frequency domain of the vibration signal. Wavelet packet transform was also used on the raw signal getting each wavelet coefficient’s energy which were considered features. A data matrix was constructed from the features and the selection process was set to run iteratively (i.e., one GA iteration then one RF training phase). Once the optimal feature subset was selected and the GA execution terminated, the classifier was then retrained with this subset. The classifier performance was then tested and a good accuracy was achieved. Seera et al. [
112] proposed a classification model using RF and a fuzzy min-max neural network for the diagnosis of REB faults. The features for input were extracted from the raw vibration signal using both power spectrum and sample entropy methods. Tests showed that using a combination of these features gave the highest accuracy compared to each method individually. The proposed model was also compared to other models proving that it had the highest accuracy and lowest standard deviation. Vakharia et al. [
113] conducted a fault diagnosis on a bearing using vibration signals. With a feature ranking technique called ReliefF, significant features extracted from the time domain and discrete wavelet transform were selected for use with the RF classifier. The selected features in combination with the classifier performed well in the diagnosis of bearing faults.
There are several other supervised machine learning classifiers that can be implemented for machinery health classification including KNN, Naïve Bayes, and discriminant analysis [
114,
115,
116,
117,
118,
119]. The use of supervised learning in the classification of machinery condition is generally preferred due to the model being trained to perform extremely well for a particular application.
5.2. Metaheuristic Optimisation Techniques
Another common use for intelligent algorithms is in the optimisation of parameters to obtain a suitable solution. Metaheuristic optimisation techniques have been of great interest to researchers in the field for tasks that require an ideal solution to be found within a large search space. The ability to measure the performance of different combinations of parameters against a certain criterion allows for the automatic selection of an optimal solution without the need for significant experience in the domain. This makes it desirable for diagnostic tool users who lack experience and a deeper understanding of bearing fault behaviour. Many of these optimisation techniques are based on concepts found in nature.
Evolutionary algorithms are a category of metaheuristic optimisation inspired by the concept of natural selection and are often applied to search for an optimal solution to a specific problem. The most popular evolutionary algorithm is the GA and its general operation is described in [
59]. Some studies have explored the use of GA for the optimisation of demodulation band for the envelope spectrum. In order to optimally demodulate resonance for REB fault diagnosis, Zhang and Randall [
120] first used fast kurtogram to roughly estimate parameters. GA was then used for further optimisation of the parameters obtained allowing for faster convergence than directly using GA for the selection of the ideal bandpass filter. Wang et al. [
121] conducted a study involving the detection of sun gear crack in a planetary gearbox through envelope analysis. Through the development of an index measuring fault-related components to non-fault-related components in the envelope spectrum, GA was used to search the frequency range for an optimal subband. Kang et al. [
122] also used a GA for the selection of optimal bandpass filter parameters in the condition monitoring of bearings. Unlike other studies, the parameters were coded with real numbers from 0 to 1 as opposed to binary values. The use of a real-coded GA is said to be advantageous for continuous parameter space variables [
123]. For this reason, it can be inferred that this approach generally allows for a higher accuracy in the representation of the optimal bandpass filter and that less storage will be needed [
124]. The fitness score used was a ratio of residual-to-defect frequency components which was said to give insight into the degree of defectiveness [
122]. The use of the fitness score as an indicator to determine defect severity, however, is not ideal. This is because the score for the same signal and, by extension, bearing defect size can be expected to vary when the GA converges at a local or global optimum which could cause confusion. Swarm intelligence is another category of metaheuristic optimisation that is inspired by the collective behaviour of a population with no centralised structure controlling individuals. Common types of algorithms that use the concept of swarm intelligence are particle swarm optimisation (PSO) and ant colony optimisation. Uses for these algorithms are similar to that of GA.
Metaheuristic optimisation techniques have also been used with classification algorithms for the optimisation of parameters or structure of the classifier. In [
125], Yan and Jia conducted fault diagnosis by using an optimised SVM for classification. Features were extracted in multiple domains and Laplace score was used to determine which of these were to be used to reduce unnecessary or redundant characteristics. The selected features were then used as input to an SVM whose parameters were optimised using PSO for classification of bearing faults. Unal et al. [
126] used an optimised ANN for the fault diagnosis of REB from vibration signals. A GA was used to optimise the structure of the ANN increasing performance of fault classification. This was demonstrated by using GA in the optimal selection of hidden layer number, number of neurons, and mean square error. In a study conducted by Li et al. [
127], an optimised SVM was used for fault diagnosis in REB. This was achieved by using an improved ant colony optimisation algorithm for the suitable selection of SVM parameters.