3.1. Proposed Solution
The basic idea underlying the proposed approach is explained below. We map the extracted features to a distributed vector (hence the encoding phase) and it is used to classify point cloud classes. In this section, after a brief overview of the notation and conventions we used, the various components of the network we tested will be described in detail. The proposed model is shown in
Figure 1.
We used uppercase letters like W and U to represent matrices, lowercase letters like b and x to represent vectors and x = [x<1>, x<2>, . . . , x<n>] to represent a vector of the input component. The index i represents the i-the component of this input component. We will indicate the set of parameters of our model with the capital Θ.
Input layer: Each record of feature is displayed as a vector
x = [
x<1>,
x<2>, . . . ,
x<n>]. Where the i-the index of this vector represents the i-the property of this input compo- nent. We normalized the input because of the varying numerical scale of the input components.
The encoding layer: A single distributed vector must record the mean- ing of the input. By having an input component s containing the h attribute, the result of the encoding phase output can be expressed as
enc(
s) =
h<k>, where
h<k>Rj and the value of
j is a hyperparameter. We used a recurrent layer to encode the input layer. In this layer, the amount activated in the hidden layer depends on the current input value and the output value in the previous step. In general, we will have:
Where g is the recurrent cell,
r<k> the current input feature, h<k−1> the output of the hidden layer at time
k, and
h<k−1> is the output of the hidden layer at time k − 1 and
enc are the learnable parameters in the learning phase. Accordingly, the encoding phase is as follows:
enc(S ) requires recurrent layers to produce. An overview of the proposed return layers is shown in
Figure 2.
In fact, the proposed method is a four-layer structure. These four layers at a glance are:
Input layer: In this layer, each of the input features that are related to a classifi- cation are given to the GRU inputs.
GRU layer: The inputs of the GRU layer are vectors derived from the input layer.
LSTM layer: The inputs of the LSTM layer are vectors derived from the GRU layer.
Output layer: The output of the LSTM network is initially flattened.
Here are the details of each of these steps:
GRU layer: After the preprocessing operation is performed on the research data set, the data is sent to the GRU layer in the form of a normalized window. In this step, the number of GRU layer blocks was equal to the number of features.
Figure 4 shows how this process works. In
Figure 3, the first layer is the input layer and features and
the second layer is the GRU layer. This figure shows the sending of samples to the GRU layer. The inputs of the GRU layer are vectors obtained through sliding windows
and the output is calculated through the following equations.
Where
is update gate, rt is rest gate,
is candidate gate, and
is output activation. WZ, WR,
, UZ, UR, UN are learnable matrixes, bn,
,
are learnable biases, σ is sigmoid activation function, and
an element-wise multi-plication.
LSTM layer: The next step in the proposed method is to send the output of the GRU layer as input to the LSTM layer.
Figure 4 shows how this process works.
Figure 4.
An overview of LSTM.
Figure 4.
An overview of LSTM.
The LSTM input in step T is the vector X ∈ RE. The hidden vector sequence is in LSTM, is calculated by the following equations:
Where i the input gate, o the output gate, f the forget gate, and g the update gate, [Wi,
,
, Wo, bi, bf ,
,
] is the set of parameters to be learned. q<k> is updated through the following relationship:
where the
symbol represents the element-wise product between two vectors.
Finally, the activation of the cell is accomplished through the following relationship:
To determine the class of each point, it is sufficient to apply a sigmoid to the encoding phase output as follows: