3.1. Hyperparameters Tuning for DeepOnet and LSTM Models
In the modeling process, it is essential to tune the hyperparameters of the network model, which can be categorized into five types: network layers and neuron count, training and testing set partition ratio, activation function selection, loss function, and optimizer function.
In this subsection, the experimental data for roll motion under wave height H of 5.6mm in the ship model experiments were used, comprising 1600 sample points. A 20-second dataset of ship roll motion was employed to discuss the hyperparameters of the DeepOnet and LSTM models. To enhance the persuasiveness of the experimental results, parameters other than the hyperparameters under discussion were set as default: training set ratio of 70%, 5 layers and 80 neurons for both the trunk and branch networks of the DeepOnet model, and 3 layers and 100 neurons for the LSTM model. Other shared parameters between the two models were set to the same values: optimizer as Adam, loss function as MSE, and batch size as 32.
If a neural network does not impose restrictions on the number of layers and neurons, it can approximate any non-linear function. However, stacking too many layers and increasing the number of neurons will consume significant computational resources. If the features of a model are not overly complex, setting too many layers and neurons is not economical. The roll motion predictions were conducted by varying the network layers of DeepOnet and LSTM to 3, 5, and 7, and the neuron count to 80, 100, and 120. The prediction results are presented in
Table 4.
Figure 4a,b depict the prediction results of the LSTM model and the DeepOnet model with 80 neurons under different numbers of layers. It is evident that the prediction performance of the two models is nearly identical when the number of layers is the same under the condition of the same number of neurons.
As shown in
Table 4, the mean square errors of the testing sets for both models can reach the magnitude of 10-4 or below. Regarding the number of network layers, it can be observed that a higher number of layers does not necessarily result in better predictive capabilities. This is due to the risk of overfitting the function to the training data, leading to a decrease in the model’s generalization ability when faced with new datasets. Regarding the number of neurons, it is observed that, like the network layers, more neurons do not necessarily yield better results, and fewer neurons do not imply worse results. The model exhibits better performance when configured with 5 layers and 100 neurons.
As a data-driven approach, it is evident that the size of the training set will influence the predictive results of both network models. Experiments were conducted using the DeepOnet model and the LSTM model to investigate the impact of the training set percentage on the ship roll prediction. The experiments involved varying the training set size for both the DeepOnet and LSTM models to 60%, 70%, and 80%. The prediction results are presented in
Table 5.
To comprehensively analyze the predictive capabilities of the two models under different dataset partitions, the following section will take the dataset with the smallest prediction ratio as the baseline. Specifically, the comparison will be based on the data quantity where the testing set comprises 80% of the total data. The predictive results are presented in
Figure 5.
For a more intuitive representation of the impact of different training set sizes on predictive performance, the Mean Square Error (MSE) in
Table 5 is converted into Root Mean Square Error (RMSE). It can be observed that as the training set increases, the predictive error decreases. In the results predicted by the DeepOnet model, the forecast error for the 80% training set is reduced by 24.5% compared to the 60% training set. Similarly, in the results predicted by the LSTM model, the forecast error for the 80% training set is reduced by 42.7% compared to the 60% training set. This implies that a larger training set allows the model to learn more features, leading to better predictive performance on the testing set.
The motion of ships in the actual ocean exhibits highly pronounced nonlinearity. Therefore, determining which activation function can endow the network model with more potent nonlinear fitting capabilities is particularly crucial. Commonly used activation functions include the hyperbolic tangent function (tanh), the sigmoid function, and the rectified linear unit (ReLU) function. Their specific schematic diagrams are depicted in
Figure 6.
The sigmoid function shown in
Figure 6 has an output range of 0 to 1. Consequently, it normalizes the output for each neuron, making it more suitable for models where probability is used as the output. The tanh function, a hyperbolic tangent function, exhibits a similar graph to the sigmoid function but with an output range between [-1, 1]. It provides a smoother and smaller gradient, addressing the slow convergence issue compared to the sigmoid function. The ReLU activation function is a piecewise function. When x is greater than 0, it linearly outputs the value; when x is less than 0, the output is zero. Therefore, this activation function is suitable for certain classification problems. Based on the distinctive characteristics of each activation function, all three are employed in the LSTM and DeepOnet models. Under these activation functions, predictions are made for different ship roll, pitch, and heave motions. The prediction results are summarized in
Table 6.
Table 6 presents the test set loss values for different activation functions in the prediction results. The term “Error” indicates that, under the same network structure, changing the activation function to the ReLU function alone renders the entire network unable to learn the feature vectors of roll, pitch, and heave motions. The network fails to train, and even after multiple adjustments to other hyperparameters, it remains ineffective for prediction, producing constant values for all predictions. This issue arises from the inherent structure of the ReLU function, where the derivative is zero for values less than zero, leading to neuron death—certain neurons outputting zero. In contrast, the LSTM network can make predictions because, before prediction, the data undergo normalization, constraining all values to the range [0, 1].
From the comparative results in
Figure 7a,b and
Table 6, it is evident that there are significant differences in the predictive performance under different activation functions. The tanh activation function consistently exhibits better predictive performance relative to other activation functions across all prediction experiments. In contrast, the performance of the sigmoid activation function varies widely and is highly unstable, especially when dealing with complex data situations such as the heave motion in this experiment. This instability is attributed to the vanishing gradient problem that the sigmoid activation function is prone to during the propagation process in deep networks.
The most used loss functions in regression problems are Mean Squared Error (MSE) and Mean Absolute Error (MAE). The MSE function has a smooth, continuous, and everywhere-differentiable curve, facilitating the gradient descent algorithm. Moreover, as the error decreases, the gradient also decreases, aiding in the rapid convergence of the model. However, when the difference between the true value and the predicted value is greater than 1, the square calculation amplifies the error, making it sensitive to outliers. In comparison to MSE, the advantage of MAE lies in its insensitivity to outliers, stable gradients, and a reduced risk of gradient explosion. Nevertheless, its main drawback is that when the predicted value equals the true value or approaches it infinitely, the function becomes non-differentiable, and gradients are mostly equal, hindering convergence and model training.
MAE and MSE loss functions are utilized to predict roll, pitch, and heave motions for both DeepONet and LSTM. The experimental results are summarized in
Table 7.
Figure 8 illustrates that both Mean Squared Error (MSE) and Mean Absolute Error (MAE) exhibit favorable predictive performance, with MSE showing better performance in detail. Simultaneously, observing the prediction results of the two models for heave motion reveals that MSE performs better when facing predictive problems with weak periodic regularity and numerous outliers. The same conclusion can be drawn from
Table 7. The reason behind this outcome is that the Mean Absolute Error loss function is more sensitive to outliers, while the Mean Squared Error loss function places greater emphasis on overall smoothness.
Optimizers are algorithms, such as the gradient descent algorithm, used to compute the optimal values of parameters, including network weights and biases, during the training process of neural networks. Essentially, the goal is to find the solution to mathematical optimization problems by seeking the optimal parameter values among all possible combinations, often represented as the extremum in the function’s graph.
This study discusses two optimizers: Adaptive Moment Estimation (Adam) and Stochastic Gradient Descent (SGD). Adam combines first-moment estimation (mean of gradients) and second-moment estimation (uncentered variance of gradients) to calculate the update step size. It possesses advantages such as simplicity, computational efficiency, low memory requirements, insensitivity to gradient scaling transformations, the ability to limit the update step size within a reasonable range (initial learning rate), suitability for large-scale data and parameter scenarios, and applicability to unstable target functions.
On the other hand, SGD, a core optimization algorithm in various scientific and engineering fields, minimizes the mathematical problem of minimizing the objective function. While SGD has low requirements for gradients and fast gradient computation, it tends to fall into local minima. To address this limitation, SGD often needs to be combined with other algorithms to achieve better optimization results. Additionally, in situations with high data noise, the weight update direction may not always be correct.
Adam and SGD are separately applied to predict roll, pitch, and heave motions for DeepONet and LSTM. The experimental results are presented in
Table 8.
From
Figure 9, it is evident that the predictive performance of the Adam optimizer is significantly superior to that of the SGD optimizer. On average, the predictive performance of the Adam optimizer is 2-3 orders of magnitude higher than that of the SGD optimizer.
3.2. Multi-Steps Prediction by DeepOnet and LSTM Models
Following the hyperparameter optimization in
Section 3.1, the branch sub-network of the DeepOnet network has 7 layers with 100 neurons, and the trunk sub-network also has 7 layers with 100 neurons. The input for the branch sub-network is a history of 80 wave height data points, and for the trunk sub-network, it is a history of 20 degrees of freedom data points. Other network parameters include: activation function as tanh, iteration times as 10,000 steps, optimizer as Adam with a learning rate of 0.001, and loss function as MSE. After normalizing the data, it is input into the DeepOnet network for training and testing set forecasting. The prediction errors are shown in
Table 9, and the prediction curves for a prediction step of 20 are depicted in
Figure 10.
The results demonstrate that DeepOnet proficiently predicts and fits the three degrees of freedom motion of a ship. The distinctive feature of this neural network lies in the convolution of results separately computed by the branch and trunk networks, facilitating the learning of operator functions between data. As depicted in
Figure 10, the entire fitted points closely approximate the original values, underscoring that DeepOnet effectively learns the operator functions between waves and each degree of freedom motion.
Based on the trial calculations of hyperparameters in section 3.1, the dataset is divided into a 70% training set and a 30% test set. The LSTM network parameters are as follows: the network has 5 layers of hidden neurons, each layer containing 100 neurons. The length of the input vector is 80, with the activation function being tanh, the loss function is Mean Squared Error (MSE), the number of iterations is set to 500, the optimizer chosen is Adam, and the learning rate is 0.001.
Table 10 shows the relationship between degrees of freedom errors and forecast lead times.
Figure 11 compares the forecast results with the true values when the forecast lead time is 20. From the figure, it can be observed that LSTM can achieve multi-step prediction with excellent accuracy.
Figure 11 represents the multi-degree-of-freedom coupled motion prediction with a forecast lead time of 20 steps. As the forecast extends over multiple steps, the prediction performance deteriorates due to the accumulation of errors. Additionally, since the network model simultaneously takes input data from three degrees of freedom, it learns the motion characteristics of all three degrees simultaneously. Consequently, the prediction accuracy for a specific degree of freedom may decrease, but it remains above 85%.
To compare the accuracy of LSTM and DeepOnet under regular waves, we juxtapose
Table 9 and
Table 10 for comparison, as shown in
Table 11. It is evident that the prediction error of the DeepOnet model is significantly lower than that of the LSTM model. In the cases of roll and heave motion, the MSE has decreased by more than 10 times.