3.1. Dataset
The process carried out to create the dataset is illustrated in
Figure 4. The readings obtained from the test under controlled conditions have been labeled with the test conditions themselves. Then each of these signals has been taken and divided into segments that act as a point sensor, i.e. they can be assigned a single temperature and a single strain value. Finally, these segments have been taken two by two and the temperature and strain increment corresponding to the pair has been computed, thus storing the two segment signals (P and S) and the difference of states. As a priori the most suitable pre-treatment of the signals was not known, it was decided to keep both signals (
) in full in order to evaluate the performance of the different transformations.
From the test performed, data have been obtained at four temperatures: 20, 30, 40, and 50
oC; and five strain states associated with an end deflection of: 0, 3.11, 6.22, 9.33, and 12.45 mm. These combined states result in 20 different states which, combined two by two, amount up to 190 possible combinations. By means of the optical interrogator used, the optical fiber was converted into a succession of overlapping sensors with a sensor length of 20 mm (with a sampling period of 10
m) and a sensor spacing of 2 mm (see
Figure 2). Then, for a fiber length of 300 mm of which only measurements between 100 and 280 mm have been selected (to avoid possible errors). Taking randomly a percentage of the number of combinations, a dataset of 13950 samples has been created by taking, randomly again, a 60 % for training, 20 % for validation and the remaining 20 % for testing.
The temperature range has been determined by the room temperature at the time of the test (since the oven does not have refrigeration) and the characteristics of the adhesive used, cyanoacrylate, to adhere the optical fiber to the aluminum plate.
3.4. Neural Network
Input data
Once this check has been carried out, the artificial intelligence model capable of discerning between temperature increase and deformation from the signals of the different polarizations is designed. After trying several options to train the model, an input vector, composed of the cross-correlation of the two polarization states and the four auto-correlations of the four available measurements, is selected due to the clustering algorithm results. The input to the network is finally as follows:
where the frequency increment
has been added to provide scaling information (since the same equipment can operate in different frequency ranges).
Normalization
These input vectors have been previously normalized since training is much more efficient with normalized values. In this case, the normalization of each variable separately has been finally selected as it is the only one with which the model has been able to fit the data:
where index
i refers to the column index, that is, to each of the items that make up the vector
; on the other hand, index
j refers to the sample number.
Architecture
The network architecture is as shown in
Figure 6 and is basically a compendium of densely connected layers with hyperbolic tangent-type activation functions that add to its nonlinearity.
Training
The results of the network training are shown in
Figure 7. The training has been carried out with an Adam-type optimizer with a learning rate of
, and computing the error with the mean squared error (MSE) criterion.
As can be seen at the 100 training epochs, the behavior is asymptotic, which indicates that the model is not able to fit the data. On the other hand, from the behavior of the validation curves, it can be determined that the model has not undergone overfitting.
Results
Once the model has been trained, it is evaluated with the test data. In
Figure 8 and
Figure 9 error histograms for the target variables are shown together with an approximation of normal distribution, whose coefficients are shown in Table . In addition, confidence intervals of 99% and 95% have been computed and the limits of these intervals are also shown in
Table 3.
For a better comprehension, the predicted versus target values have also been plotted on a plane (see
Figure 10 and
Figure 11) to create the equivalent of a confusion matrix but with continuous data. If the model worked perfectly, the predicted and target data would be the same, thus plotting a diagonal. For this reason, a least squares regression line has been plotted over the plane. The regression line has the form
where
x represents the target value and
y the predicted one,
m the slope of the straight line and
n the ordinate at the origin. Additionally, the value of
has also been computed to determine the dispersion of the values. All of the above values can be also found in
Table 3.
Explainable Artificial Intelligence (XAI)
Explainable Artificial Intelligence (XAI) consists of a series of methods aimed at converting the results provided by an Artificial Intelligence (AI) in a way that can be understood by humans. It contrasts with the ”black box” concept of machine learning, where even its designers cannot explain why an AI arrived at a particular decision. Thanks to XAI methods, features can be extracted that allow existing knowledge to be confirmed, existing knowledge to be questioned, and new hypotheses to be generated.
In this case, the XAI allows explaining how the developed model interprets the correlations of the signals to reveal the information on which the actions are based. To implement the XAI methods on this model, the
Lime-For-Time repository (see [
11]) has been used, in which the
LIME library ([
1]) is used to analyze time series.
The analysis consists of taking an example signal and dividing it into segments, which in this case have been 120, given the composition of the signal. Next, the neural network is studied as a classifier in which each segment constitutes a class and which contributes, to a greater or lesser extent, to the final decision. From the example signal, multiple variations of the parameters are made to interpret how the model behaves to the different inputs. From these data, the relevance of each of the segments can be determined. Taking the 12 most relevant segments and displaying their weights in a histogram, the images shown in
Figure 12 and
Figure 13 are produced.
If these segments are displayed over the signal, it can be seen which regions are the most relevant. In this case, to represent the importance, an opacity has been assigned according to the weights previously shown in the histograms. The result of this graphical representation is the one shown in
Figure 14 and
Figure 15.