4.1. CNN-Based Applications
CNN is a class of DL model, most commonly used to analyze images [
87]. It is attracting interest across a variety of domains including optical sensor applications. In this section, some of recent works that apply this model for optical sensors will be presented in brief.
In [
88], a CNN model was developed to realize an optical fiber curvature sensor. A large number of specklegrams have been detected from facet of multimode fiber (MMF) automatically in the experiments. The detected specklegrams was preprocessed and fed to the model for training, validation and testing. The dataset was collected as light beam by designing automatic detection experimental setup as shown in
Figure 6. The light beam was detected by a CCD camera which has a resolution of
and pixel size of
. As shown in
Figure 7, the architecture of VGG-Nets was adopted to build CCN. The mean squared error (MSE) was used as the loss function. The prediction accuracy of the proposed CNN was
of specklegrams with the error of curvature prediction within
. However, the learning-based scheme that was reported has the capability to only predict a solitary parameter and does not fully utilize the potential of deep learning.
In [
9], the authors proposed semi-supervised deep learning for a track detection. An experimental setup was created using a portion of a highspeed railway track, and a distributed optical fiber acoustic system (DAS) was installed. In the proposed model, an image recognition with a specific dataset pre-processing and greedy algorithm for selecting hyperparatmeters had been used.
The considered events that were supposed to recognize in this model are shown in
Table 1.
In addition, the hyper-parameters were selected based on a greedy algorithm. The obtained dataset after augemntation process is shown in
Table 2.
Four structural hyper-parameters were used in this work as shown in
Table 3. The obtained accuracy of the proposed model was
. However, it is important to highlight that traditional methods perform better spatial accuracy. Some other related works can be referred in [
89,
90,
91,
92,
93,
94,
95,
96].
In [
97], a distributed fiber optical sensor using a hybrid Michelson-Sagnac interferometer has been proposed. The motivation of the proposed model was to solve the problems of inability of the conventional hybrid structure to locate in the near and the flawed frequency response. The proposed model utilized basic mathematical operations and
optical coupler to obtain two phase signals with time difference that can be used for both location and pattern recognition. The received phase signals were converted into two-dimensional images. These images are used as dataset and were fed into CNN to obtain the required pattern recognition. The dataset contained 5488 images with 6 categories, and the size of each image was
in jpg format. The description of the dataset is shown in
Table 4. The Structure diagram of CNN is shown in
Figure 8. The accuracy of the proposed model was
. However, the sensing structure employed is relatively simple and does not consider factors such as the influence of backward scattered light.
In [
98], DL model was proposed to extract time-frequency sequence correlation from signals and spectrograms to improve the robustness of the recognition system. The authors designed a targeted Time Attention Model (TAM) to extract features in the time frequency domain. The architecture of the TAM model comprises two stages, namely the convolution stage for extracting features, and the time attention stage for reconstruction. The process of a data streaming, domain transformation and features extraction to output is shown in
Figure 9. The knocking event is taken as an example. The convolution stage is used to extract characteristic features. Here, the convolutional filter established a local connection in the convolution and shared the weights between receiving domains. The pooling layers emphasized the shift-invariance feature. A usual CNN model is used as the backbone. As shown in
Figure 9, in the left stage, information was extracted from the spectrogram and transformed into a feature map
, where 1 represents the number input channels (gray image has one channel), 128 and 200 represents the height and the width of the input respectively. The authors collected and labeled a large scale of dataset of vibration scenes included of
pieces of data of 8 vibration types. The experimental results indicated that this approach significantly improved the accuracy at a too low additional computational cost when compared with the related experiments [
99] and [
100]. The time attention stage was designed for features reconstruction in which TAM was used to serve two purposes. The first purpose is to extract the sequence correlation by cyclic element. The second purpose is assign the weight matrices for the attention mechanism. F1 and F2 were unique in their emphasize on investigating the "where" and "what" features of time. A F-OTDR system was constructed to classify and recognize vibration signals. The F-OTDR system contains a sensing system in addition to a producing system. This study was verified using a vibration dataset including eight different scenarios which were collected by a F-OTDR system. The achieved classification was accuracy of
. However, this method not only complicate the data processing procedure but also has the potential to result in the loss of information during the data processing phase.
In [
101], a real-time action recognition model has been proposed for long-distance oil–gas PSEW systems using a scattered distributed optical fiber sensor. They used two methods to calculate two complementary features, a peak and an energy features, that describe signals. Based on the calculated features, deep learning network (DLN) was built for a new action recognition. This DLN can effectively describe the situation of long-distance oil–gas PSEW systems. The collected datasets were 494 GB with existing of several types of noise at a China National Petroleum Corporation pipeline. The collected signal involved four types of events include background noise, mechanical excavation, manual excavation, and vehicle driving. As shown in
Figure 10, the architecture of the proposed model consists of two parts. The first part deals with a peak and the second part deals with an energy. Each part consists of many layers including ConvD1, batch normalization, maxpool, dropout, Bi-LSTM and Fully Connected layer. Any damage events can be located and identified with accuracies of 99.26% (at 500 Hz) and 97.20% (at 100 Hz). Nonetheless, all the aforementioned methods consider an acquisition sample as a singular vibration event. However, for dynamic time series identification tasks, the ratio of valid data within a sample to the overall data is not constant. This means that the position of the label in relation to the valid portion of the input sequence remains uncertain. Another related researches can be seen in [
102,
103,
104].
In [
105], the authors presented application of signal processing and ML algorithms to detect events using signals generated based on DAS along a pipeline. ML approach and DL approach were implemented and combined for event detection as shown in
Figure 11. A novel method to efficiently generate training dataset was developed. Excavator and none excavator events had been considered.
The sensor signals have been converted into gray image that was used to recognize the events depending on the proposed DL model. The proposed model was evaluated in a real-time deployment within three months in a suburban location which its architecture is shown in
Figure 12.
The results showed that DL is the more promising approach due to its advantages over ML as shown in
Table 5. However, the proposed model only differentiated between two events, namely ’excavator’ and ’no excavator,’ while there are multiple distinct events. Additionally, the system was tested in a real-time deployment for a duration of three months in a suburban area. However, for further validation and verification, it is crucial to conduct tests in different areas and over an extended period of time.
In [
106], an improved WaveNet was applied to recognize man-made threat events using Distributed optical fiber Vibration Sensing (DVS). The improved WaveNet is called SE-WaveNet (squeeze and excitation WaveNet). WaveNet is a one-dimension CNN (1-DCNN) model. As a deep 1-DCNN, it can quickly achieve training and testing, while also boasting a large receptive field that enables it to retain complete information from 1-D time series data. The SE structure functions deployed to the residual block of WaveNet in order to recognize 2-D signals. The SE structure functions as an attention mechanism, which allowing the model to pay focus on channel features to obtain more information. It can also suppress the unimportant channel features. The structure of the proposed model is shown in
Figure 13. The input of SE-WaveNet is an n × m matrix which was synthesized from n points spatial signals beside m groups of time signals. The used dataset is shown in
Table 6. Results showed that the SE-WaveNet accuracy can reach approximately 97.73%. However, it is important to note that the model employed in this study was only tested on a limited number of events, and further testing is necessary to evaluate its performance in more complex events, particularly in engineering applications. Additionally, additional research is needed to validate the effectiveness of SE-WaveNet in practical real-world settings.
In [
14], CNN and Extreme Learning Machine (ELM) were applied to discriminate between ballistocardiogram (BCG) and non-BCG signals. CNNs was used to extract relevant features. ELM, [
107], is a feedforward neural network that takes as input the features extracted from CNN and provides the category matrix as output.
Figure 14 and
Table 7 show the architecture of the proposed CNN-ELM and the proposed CNN respectively.
BCG signals were obtained with a microbend fiber optical sensor based on IoT which was taken from ten patients diagnosed with obstructive sleep apnea and submit drug-induced sleep endoscopy. To balance the BCG (ballistocardiogram) and non-BCG signal samples, three techniques were employed: undersampling, oversampling, and generative adversarial networks (GANs). The performance of the system was evaluated using 10-fold cross-validation. Using GANs to balance the data, the CNN-ELM approach produced the best results. The average accuracy was 94%, precision was 90%, recall was 98%, and F-score was 94% as shown in
Table 8. Inspired by [
108] the architecture of the used model is presented in
Figure 15 to balance BCG and non-BCG chunks. Another related works are presented in [
109,
110].
In [
11], the efficiency and accuracy enhancements of bridge structure damage detection has been addressed by monitoring the deflection of the bridge using the fiber optic gyroscope. DL algorithm is then applied to detect any structural damage. They proposed a supervised learning model using CNN to perform structural damage detection. It contains eleven hidden layers that can be trained to automatically identify and classify any bridge damage. Adam optimization method was considered and the hyperparameters that were used are listed in
Table 9. The obtained accuracy of the proposed model was 96.9% and better than random forest (RF) which was (81.6%), SVM which was (79.9%), k-nearest neighbor (KNN) which was (77.7%), and decision trees (DT). In the same direction, there is a work has been done in [
111] and [
112].
The authors in [
113] proposed an intrusion pattern recognition model based on the combination Gramian Angular Field (GAF) and CNN, which possessed both high recognition speed and accuracy rate in recognition. They used GAF algorithm for mapping 1-D vibration sensing signals into 2-D images with more distinguishing features. The GAF algorithm retained and highlighted the distinguishing differences of intrusion signals. This was useful for CNN to detect intrusion events with more subtle characteristic variations differences. CNN-based framework was used for processing vibration sensing signals input images. According to the experimental results, the average accuracy rate for recognizing three natural intrusion events (light rain, wind blowing, heavy rain) and three human intrusion events (impacting, knocking, slapping) on the fence was found to be 97.67%. With a response time of 0.58 seconds, the system satisfied the real-time monitoring requirements. By considering both accuracy and speed, this model achieved automated recognition of intrusion events. However, the application of complex pre-processing and denoising techniques to the original signal presents a challenge for intrusion recognition systems when it comes to effectively addressing emergency response scenarios. Another work in the same direction was presented in [
114].
Bending recognition model using the analysis of MMF specklegrams with diameter being 105 and 200
was proposed and tested in [
115]. The proposed model utilized a DL-based image recognition algorithm. The specklegrams detected from the facet of the MMF while subject to various bendings were utilized as input data.
Figure 16 shows the used experimental setup to collect and detect fiber specklegrams.
The architecture of the model was based on VGG-Nets as shown in
Figure 17.
The obtained accuracy of the proposed model for two multimode fibers is shown in
Table 10.
The authors in [
124] used CNN to demonstrate the capability for the identification of specific species of pollen from the backscattered light. Thirty-core optical fiber were used to collect the backscattered light. The input to CNN was camera images, these input data have been divided into two sets: distance prediction and particle identification. In the first type, the total number of collected images was 1500, 90% of them were used as a training set and 10% were as a validation set of the CNN. In the second type, the 2200 images were collected, and 90% of them were used as training set and 10% were a validation set. The training procedure of the proposed model is depicted in
Figure 18. The second version of ResNet-18 ( [
125,
126]) was used to propose the required model with the batch normalization [
127] with mini-batch size of 32 and momentum of 0.95. The output was single regression (single output). The neural network, trained to identify pollen grain types, achieved a real-time detection accuracy of approximately 97%. The developed system can be used in environments where transmission imaging is not possible or suitable.
In [
13], A DL-based distributed optical fiber sensing system was proposed for event recognition. A temporal-spatial data matrix from F-OTDR system was used as input data to CNN. The proposed method has some good characteristics such as a gray-scale image transformation and a bandpass filtering which were needed as pre-processing before classification instead of the usual complex data processing, small size and high training speed, and classification accuracy. The developed system was applied to recognize five distinct events: background, jumping,walking, digging with a shovel, and striking with a shovel. The collected data was split into two types as shown in
Table 11. The combined dataset for the five events consisted of 5644 instances..
Some common CNNs are examined and the results are shown in
Table 12.
The considered training parameters for all CNNs were the same. The total training steps were 50,000, learning rate was 0.01, and the adopted optimizer was root mean square prop (RMSProp) [
128]. This work concluded that VGGNet and GoogLeNet obtained better classification accuracy (grater than 95%) and GoogLeNet was selected to be the basic CNN structure due its model size. Further improvement of the model, inception-v3 of GoogLeNet was used.
Table 13 shows the classification accuracy achieved for the five events. The authors have optimized the network by tunning the size of some layers of the model. The
Table 14 shows the comparison between the optimized model and Inception-v3. However, it is important to note that this study trained the network using relatively small datasets consisting of only 4000 samples. Moreover, traditional data augmentation strategies employed in image processing, such as image rotation, cannot be directly applied to feature maps generated from fiber optic sensing data.
In [
8], the authors designed a deep neural network for identifying and classifying external intrusion signals from a 33km optical fiber sensing system in the real environment. In that article, the time-domain data was putted directly into a DL model to deeply learn the destructive intrusion events characteristics and establish a reference model. This model included two CNN layers, one linear layer, one LSTM layer, and one fully connected layer as shown in
Figure 19. It was called Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks (CLDNN). The model effectively learned the signal characteristics captured by the DAS and was able to process the time-domain signal directly from the distributed optical-fiber vibration monitoring systems. It was found to be simpler and more effective than feature vector extraction through the frequency domain. The experimental results demonstrated an average intrusion event recognition rate exceeding 97% for the proposed model.
Figure 20 shows DAS system using the F-OTDR and the process of pattern recognition using the CLDNN. However, the proposed model was not evaluated as a prospective solution for addressing the issue of sample contamination caused by external environmental factors, which can lead to a decline in the accuracy of recognition. Another related work can be seen in [
129].
A novel method was developed in [
12] to generate efficiently a training dataset using GAN [
130]. End-to-end neural networks was used for processing the data that was collected by using DAS system. The proposed model’s architecture utilized the VGG16 network [
23]. The purpose of the proposed model was for detection and localization of seismic events. One extra convolutional layer was added in order to match the image size then a fully connected layer was added at the end of model. Batch normalization for regularization and ReLU activation function were used. The model was tested with experimental collected data within 5km long DAS sensor and the obtained classification accuracy was 94%. Nevertheless, achieving a reliable automatic classification using the DAS system remains computationally and resource-intensive, primarily due to the demanding task of constructing a comprehensive training database, which involves collecting labeled signals for different phenomena to be classified. Furthermore, overly complex approaches may render real-time applications impractical, introducing potential processing-delay issues. Other works in the same direction were presented in [
131] and [
132].
In [
129], the authors presented a DL model to recognize six activities, which are walking, digging with a shovel, digging with a harrow, digging with a pickaxe, facility noise and strong wind. The DAS system based on F-OTDR was presented along with novel threat detection, signal conditioning, and threat classification techniques. The CNN architecture used for classification was trained with real sensor data and consisted of five layers, as illustrated in Figure as shown in
Figure 21. In that algorithm, an RGB image with dimensions 257×125×3 had been constructed. This image was constructed for each detection point on the optical fiber, which helped determine the classification of the event through the network. The results indicated that the accuracy of threat classification exceeded 93%. However, increasing the depth of the network structure in the proposed model will unavoidably result in a significant slowdown in training speed and can potentially lead to overfitting.
In their study published in [
133], the authors proposed an approach to detect defects on large-sized PCBs and measure their copper thickness before the mass production process using a hybrid optical sensor HOS based on CNN. The method involves combining microscopic fringe projection profilometry (MFPP) with the lateral shearing digital holographic microscopy (LSDHM) for imaging and defect detection, utilizing an optical microscopic sensor with minimal components. This allowed for more precise and accurate identification of different types of defects on the PCBs. The proposed approach has the potential to significantly improve the quality control process in PCB manufacturing, leading to more efficient and effective production. The researchers’ findings demonstrate a remarkable success rate, with an accuracy of 99