1. Introduction
With the advancements in computational technology and complexity in design, engineers seek to predict the component’s performance before it can be manufactured. These predictions involve complex boundary value problems (BVP), which require solutions to partial differential equations (PDEs). Typically, these PDEs are solved through numerical approximations. In solid mechanics, conventional methods to solve PDEs include mesh-free methods1,2, finite element analysis (FEA)3, and isogeometric analysis4,5. However, these methods are poised with challenges, such as obtaining robust and accurate solutions for ill-posed, high-dimensional, or coupled problems, to name a few.
Machine learning methods are rapidly developing as an alternative to traditional approaches to minimize or, in some cases, eliminate the issues mentioned above. Liu et al.6 have discussed FEA’s evolution, future, and transition to machine learning applications in detail. In the case of solid and fluid mechanics, machine learning models can broadly be classified into data-driven models6,7,16,17,8–15 and physics-informed neural networks (PINNs)18–25. In data-driven models, data from experimental and computational results are used to train the models. Although data-driven models can capture complex physical phenomena, the amount of data required to train the models, data quality, and data generation process impede its widespread implementation26. Another application of data-driven models in mechanics is to provide an automated framework for predicting constitutive models for material behavior, as discussed by Flaschel et al.27, Yang et al.28, and Chen29. The data-driven models can also take images to identify the material distribution and defect characterization30. To reduce dependency on a large data set, Saha et al.31 developed a model (hierarchical deep learning neural network (HiDeNN)) with universal approximation capability to interpolate between data points for extremely nonlinear relationships.
In contrast to data-driven models, PINNs use a deep neural network (DNN) to train the model based on physical laws at a set of points in the domain. PINNs with automatic differentiation were first introduced by Raisi et al.18. They have several advantages over data-driven models as they do not require data labeling and require no or minimal preliminary datasets compared to purely data-driven models. Two types of PINN models are rapidly developing in solid mechanics: deep collocation method (DCM)23–25,32 and deep energy method (DEM)33–35. In DCM, residuals of strong form define the loss function at points sampled from the physical domain, corresponding boundary, and initial conditions (collocation points). This method has been extended and improved by researchers36,37. Haghighat et al.38 used a PINN technique to solve the coupled flow and mechanical deformation equations in porous media for single-phase and multiphase flows. Improving the application of PINNs to irregular geometries, Gao et al.39 developed PhyGeoNet, a convolutional neural network model (CNN) that could learn solutions for parametric PDEs on irregular domains using elliptic coordinate mapping. Chen et al.40 generalized the PINNs to discover governing PDEs from sparse and noisy data. Though the DCM has been widely successful in predicting outcomes, the approach requires higher-order gradients, which can be computationally expensive.
In the case of DEM, the system’s potential energy (PE), expressed as a loss function, is minimized to predict the system’s displacements. Compared to classical PINNs, DEM has advantages as it relies only on first-order differentiation to train the neural network and on accurate numerical integration. The DEM method can be extended to the problems where an energy function exists and reduces dependencies on PDEs of the base function. E and Yu41 first proposed the idea of DEM using the Ritz method to solve variational problems. Nguyen-Thanh et al.34 expanded the work presented by E and Yu to 2D and 3D material models. Implementation of DEM for several BVPs has been reported in the literature33,35,42. Extending this work, Fuhg and Bouklas43 demonstrated that DEM and DCM fail to accurately resolve displacement and stresses at stress concentration regions. They proposed modified DEM (mDEM) to overcome this issue and added stresses to the loss function. However, it is worth noting that including stresses in the loss function requires a second-order differential equation. The modification allowed the successful resolution of stresses around the stress concentration regions. Abueidda et al.44 enhanced the model by developing PINN that combines residuals of the strong form and the system’s potential energy. The proposed formulation yielded a loss function with multiple loss terms. As a result, the coefficient of variation weighting scheme was also introduced in the loss function to assign the weight dynamically and adaptively for each term. He et al. extended the applicability of DEM to plasticity45 and graph convolutional network46. Similar to DEM, Sun et al.47 used DNN to develop surrogate models for fluid flows using a loss function that incorporated initial boundary conditions and governing PDEs. One of the disadvantages of PINNs over classical FEA is the time required to train the neural network model. However, users can use transfer learning48,49, where instead of retraining the model from scratch, the network can either be partially retrained, trained using prior weights and biases, or use a trained model to predict mechanics for similar geometry.
Though DEM has been successfully used to predict deformations in bending, similar accuracies were not obtained for compression and tension loadings, even for a simple 2D rectangular linear elastic plate.
Figure 1 shows the deformation under compression loading in the y-direction using the architecture proposed by Nguyen-Thanh et al.
34. The geometry and boundary conditions for the plate are shown in Figure 7. From
Figure 1, we can notice that even though symmetric pressure is applied on the plate, the resulting displacements obtained from DEM are not symmetric. Thus, there is a need to improve the accuracy of DEM under generalized loading conditions.
In this paper, two modifications are proposed to improve the accuracy of DEM while retaining dependency only on first-order derivatives. Firstly, random Fourier feature mapping is introduced in the multilayer perceptron (MLP) model to reduce bias that may be present during the model’s training. Secondly, a two-loop architecture is proposed to obtain appropriate hyperparameters for a given geometry. Recent studies have demonstrated that optimizing hyperparameters in large and multi-layered models impedes scientific progress50. This process can be challenging for DEM due to the high interaction between independent hyperparameters, large search space, difficulty in identifying objective functions, and non-convex relationships with the objective functions. Thus, the effect of different hyperparameters in loss function and the effect of loss function on displacement are also studied to find a generalized search space.
Tuning these hyperparameters through manual trial and error is highly time-consuming. As a result, researchers have employed grid search, random search, and optimization algorithms to obtain the best combination of hyperparameters22,50–53. Still, studies on optimizing hyperparameters and architectures of PINNs for solving mechanics problems remain uncommon. Wang et al.54 proposed Auto-PINN, a framework for employing Neural Architecture Search (NAS) techniques for PINNs to solve heat transfer problem. However, unlike Wang et al.54, we found that not all hyperparameters can be decoupled for DEM.
In addition, the L
2 norm is commonly employed as an objective function to obtain optimized hyperparameters. This approach requires finding a solution to the BVP beforehand. Since hyperparameters can be problem-specific, using the L
2 norm as an objective function increases the dependency of PINNs on existing solutions, reducing their generalizability. The current research demonstrates that the principle of minimum potential energy can be used as an alternative objective function to obtain optimized hyperparameters. This approach mitigates the need for prior knowledge of the solution. The examples presented in the current research are based on 2D linear elastic plane stress problems. Even though the examples solved in this study involve 2D linear elastic material models, the proposed approach can be extended to a variety of DEM problems (for example, to 3D and nonlinear material models). Random Fourier feature (RFF) mapping (discussed in
Section 3) is introduced to improve the accuracy of DEM further. RFF mapping helps in reducing bias during learning towards high frequencies. The effect of reduction in bias becomes prominent in complex geometries and loading conditions.
The paper is divided into eight sections.
Section 2 and
Section 3 describe the formulation of DEM and modifications made to DEM (m-DEM). Preliminary observations relating the accuracy of displacements obtained from DEM to the loss function and hyperparameters are discussed in
Section 4. Based on initial observations, a search space is defined and followed by hyperparameter optimization for the elastic rectangular plate in
Section 5. The transferability of optimized hyperparameters obtained in
Section 5 to different geometry and load cases is discussed in
Section 6. The effect of adding RFF mapping in MLP is shown in
Section 7. Finally, conclusions are presented in
Section 8.
2. DEM formulation
The DEM consists of a multilayer perceptron (MLP) neural network, as shown in
Figure 2. In general, MLP consists of sets of neurons arranged in layers. Each neuron is connected to all the neurons in adjacent layers. The number of layers (N) and neurons in each layer are predefined according to specific applications. In DEM, the number of neurons in the first and last layer (also known as input and output layers, respectively) is determined by the system’s dimension (two in the case of 2D and three in 3D). The programmer predefines the number of neurons in the layers between the input and output layers (also called hidden layers).
Mathematically, the value of each neuron connecting the previous layer (
is calculated using a linear combination of weights (
w) and biases (
b), as shown in Equation 1. An activation function (φ: ℝ→ ℝ) then operates on the calculated value to determine the value passed by the neuron.
The values of weights and biases are determined such that the loss function (
or
) is minimized. In a so-called backpropagation procedure, an optimization algorithm can be used to iterate over the loss function to predict values of weights and biases such that:
Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm is used in the examples shown in the following sections to minimize the loss function and demonstrate the performance of DEM. However, other algorithms can also be used. The backpropagation capabilities of PyTorch are used to calculate gradients of the loss function.
In the above description, parameters like the number of layers and of neurons in each layer need to be defined before MLP can be trained. These parameters, called hyperparameters, define the architecture of the MLP and control the learning process of the model. It is essential to obtain optimized values for the hyperparameters to achieve a minimum value for the loss function. The number of hyperparameters and their values vary according to the problem statement. Six hyperparameters based on the architecture of DEM were determined. These are the total number of layers, number of neurons in hidden layers, activation function, standard deviation for predicting weights and biases, learning rate for the L-BFGS algorithm, and the number of epochs for the L-BFGS algorithm.
The loss function for DEM is defined as the system’s potential energy (Equation 3), where
U is the system’s internal energy,
W is the energy due to externally applied load,
is a kinematically admissible displacement field,
H is the space of admissible functions,
is strain energy density function,
are internal forces of the body B, and
are traction forces applied to the area
(
Figure 3).
The selection of loss function is based on the principle of minimum potential energy: out of all the possible kinematically admissible deformations, a conservative structural system assumes displacements that minimize total potential energy at equilibrium.
4. Pre-experiments and observations
The dependency of PE on hyperparameters can be best understood using a simplified BVP. Thus, an elastic plate subjected to compression and bending loads (shown in
Figure 7) is shown as an example. The plate is 4 mm in length and 1 mm wide. It is composed of an elastic material having young’s modulus of 1000 MPa and Poisson’s ratio of 0.3. The plate is subjected to uniform compression and bending loads, as shown in
Figure 7b,c. For compressive load, a uniform compressive pressure of 50 MPa was applied to the right, while the left end was fixed in all degrees of freedom. In the case of bending loads, a uniform transverse shear load of 10 MPa
, forcing the plate to move downwards, was applied on the right, and the left end was fixed in all degrees of freedom. The strain energy density of the plate is given by:
where
and
are Lame constants.
is the strain tensor for small deformations, which is related to displacements
by:
The relationships between the Lame constants, young’s modulus (E) and Poisson’s ratio (ν) are given by equation 9.
Differentiation of strain energy density with respect to strain tensor provides Cauchy stress tensor
, which can be differentiated again to find the stiffness tensor
C.
For a 2D plane stress condition the relation between Cauchy stress tensor and strain tensor can be reduced to:
4.1. Interactions effect
Sensitivity analysis can be used to determine the impact of independent inputs (hyperparameters) on outputs (PE under compression and bending). A five-level Taguchi design of experiment (DOE) and two-level half-factorial DOE were employed to obtain the dataset for sensitivity analysis. The values of hyperparameters used to generate a five-level Taguchi DOE are shown in Table 1. Levels 1 and 5 from Table 2 were utilized to design the two-level half-factorial DOE.
After running m-DEM, the minimum potential energy corresponding to each set of hyperparameters was collected, and the data were analyzed using Minitab. Similar results were obtained for compression and bending loads. The results obtained from compression loading conditions are discussed in this section.
The analysis of the data was conducted using Minitab.
Figure 8 and
Figure 9 show the main effect and interaction plots between hyperparameters.
Figure 8 illustrates that the mean response (minimum PE) varies for each hyperparameter. We also notice that the minimum PE obtained from DEM does not follow the same trends across hyperparameters, and it varies least with the variation in activation function.
Table 2.
Values used to generate data for sensitivity analysis.
Table 2.
Values used to generate data for sensitivity analysis.
Variable |
Level 1 |
Level 2 |
Level 3 |
Level 4 |
Level 5 |
Learning Rate |
0.1 |
0.368 |
1 |
2.718 |
7.389 |
Neurons |
20 |
52 |
70 |
96 |
120 |
Standard Dev (DNN) |
0.0001 |
0.001 |
0.01 |
0.1 |
1 |
Standard Dev (RFF) |
0.0001 |
0.001 |
0.01 |
0.1 |
1 |
Total number of layers |
3 |
4 |
5 |
6 |
7 |
Activation function |
rrelu |
relu |
celu |
sigmoid |
tanh |
Figure 9 helps us visualize significant interaction effects between the hyperparameters even though each can be varied independently. Due to the high interaction, multiple hyperparameters should be varied simultaneously for tuning the hyperparameters. This process can be very tedious and time-consuming if done manually. Alternatively, computational algorithms can be used to speed up the process.
In addition to plotting the main effects plot and interaction plot, the normality of the loss function obtained from two DOE was tested using the Anderson-Darling test.
Figure 10 shows the probability plot for the loss function. Based on the test results, we can notice that the data is not normally distributed (p-value <0.005). We can also see the high probability of obtaining a loss function near -4.5. The importance of this observation is discussed later in this section.
Loss functions from five hundred DEM iterations were collected to further study the six hyperparameters’ effect. The hyperparameters were varied using the random algorithm present in HyperOpt. The minimum potential energy obtained from the iterations plotted against hyperparameters is shown in
Figure 11. However, from this figure, we do not see any trend between hyperparameters and the loss function.
4.2. Sensitivity of loss functions and displacements
Figure 11 shows that the loss function for most iterations lies in a narrow range of -5 to -4.5. Displacements in the y-direction for two randomly picked iterations are shown in
Figure 12. From this figure, we can see that even though the variation in loss function is relatively small, neither provides the correct displacement prediction. Previously we had observed that the loss function is not normally distributed, and without a trend, it is not easy to tune hyperparameters manually. This observation further strengthens the requirement for a systematic approach to tuning hyperparameters.
5. Tuning hyperparameters for different load cases
In the previous section, we concluded that all six hyperparameters are interdependent. However, from the main effects plot (
Figure 8), we can estimate that the activation function has a lesser impact on the loss function than other parameters. From random five hundred iterations, we see that the median loss function for rrelu and sigmoid is less than other employed activation functions. As a result, the activation function could be fixed to one of them, and here we select rrelu. Further, the number of layers was fixed to five. The other four hyperparameters varied according to the sensitivity profiles and ranges provided in Table 3.
Table 3.
Parameter ranges and profiles.
Table 3.
Parameter ranges and profiles.
Hyperparameter |
Sensitivity profile |
Range |
Learning Rate |
loguniform |
Exp(0-2) |
Neurons |
quniform |
20-120 |
Standard Dev (DNN) |
uniform |
0-1 |
Standard Dev (RFF) |
uniform |
0-1 |
Sensitivity profile and ranges together define the search domain. While the range provides the upper and lower bounds for each hyperparameter, the sensitivity profile influences how values are selected within the range. A profile of loguniform returns values drawn from the natural log, such that the natural log of the returned value is uniformly distributed. Quniform returns discrete values between the upper and lower bounds, and uniform profile returns a value uniformly between the upper and lower bounds.
The framework described in
Section 3 was used to tune hyperparameters for the two load cases specified in
Figure 7. The Python code developed for the framework was run on Delta supercomputer at the National Center for Supercomputing and Applications at the University of Illinois at Urbana-Champaign using an A100 single-node GPU consisting of a single AMD 64-core processor with 256 GB DDR4-3200 RAM. The optimized hyperparameters for the two load cases obtained using Tree-structured Parzen Estimator (TPE) optimization algorithms are shown in Table 4. All the cases were analyzed using 100x50 training points over the computational domain, as shown in
Figure 13. Simultaneously, FEA was also undertaken using Abaqus using element type CPS4R. The results obtained from FEA were assumed to be ground truth. The deviation of displacement predicted by DEM (using tuned hyperparameters) from those obtained from FEA was calculated using the L
2-norm given by Equation 10.
Table 5.
Optimized hyperparameters for compressive and bending loads.
Table 5.
Optimized hyperparameters for compressive and bending loads.
Hyperparameter |
Compression |
Bending |
Learning Rate |
1.35145 |
1.40475 |
Neurons |
106 |
98 |
Standard Dev (DNN) |
0.01977 |
0.03276 |
Standard Dev (RFF) |
0.46094 |
0.49815 |
Loss function |
-4.98435 |
-13.34657 |
L2-error |
0.000019 |
0.000037 |
Optimization time |
8 min 13 sec |
12 min 12 sec |
7. Effect of introducing RFF mapping
In earlier examples, RFF mapping was introduced in MLP to reduce the bias in the model’s training (as discussed in
Section 3). In this section, the models are trained without RFF mapping to study its effect. As discussed in previous sections, hyperparameters are interdependent. Thus, hyperparameter tuning was undertaken for a 4x1mm
2 plate subjected to compressive loads. The procedure discussed in the above sections was followed. The optimized hyperparameters are shown in Table 6.
Table 6.
Optimized hyperparameters for compressive load.
Table 6.
Optimized hyperparameters for compressive load.
Hyperparameter |
Compression |
Learning Rate |
1.00118 |
Neurons |
98 |
Standard Dev (DNN) |
0.05492 |
The displacements obtained from DEM without RFF mapping for three different cases: uniform compression of 4x1 mm
2 plate, uniform bending of 4x1 mm
2 plate, and uniform compression of a random geometry are shown in
Figure 24,
Figure 25 and
Figure 26. Comparing the error in displacements under compression loading (
Figure 24e, 24f) with the error when RFF mapping was present (
Figure 15a, 15b), we can conclude that no significant difference exists between the two cases. However, while comparing the errors under bending and random geometry (
Figure 15e, 15f with
Figure 25e, 25f, and
Figure 23e, 23f with
Figure 26e, 26f), we conclude that displacements predicted with RFF mapping are more accurate than the displacements predicted without RFF mapping. While comparing
Figure 23e, 23f with
Figure 26e, 26f, we notice that the error in displacements reduces by order of magnitude. Thus, the introduction of RFF mapping improved the accuracy of DEM and enabled the transferability of hyperparameters to wider geometries and boundary conditions.
Figure 1.
Displacements in the y-direction obtained from a) DEM without modifications and b) FEA. c) Error in displacements.
Figure 1.
Displacements in the y-direction obtained from a) DEM without modifications and b) FEA. c) Error in displacements.
Figure 2.
ANN for DEM (Hyperparameters marked in blue).
Figure 2.
ANN for DEM (Hyperparameters marked in blue).
Figure 3.
Solid body with boundary conditions.
Figure 3.
Solid body with boundary conditions.
Figure 4.
Fourier feature mapping in the MLP.
Figure 4.
Fourier feature mapping in the MLP.
Figure 5.
Modified deep energy method.
Figure 5.
Modified deep energy method.
Figure 6.
Algorithm for hyperparameter tuning.
Figure 6.
Algorithm for hyperparameter tuning.
Figure 7.
Plate geometry and applied boundary conditions.
Figure 7.
Plate geometry and applied boundary conditions.
Figure 8.
Main effects plot.
Figure 8.
Main effects plot.
Figure 9.
Interactions plot.
Figure 9.
Interactions plot.
Figure 10.
Anderson-Darling test for loss function.
Figure 10.
Anderson-Darling test for loss function.
Figure 11.
Variation of the loss function (PE) with 500 randomly selected combinations of hyperparameters.
Figure 11.
Variation of the loss function (PE) with 500 randomly selected combinations of hyperparameters.
Figure 12.
Displacement in the y-direction and a) loss function of -4.83355, b) loss function of -4.97902.
Figure 12.
Displacement in the y-direction and a) loss function of -4.83355, b) loss function of -4.97902.
Figure 13.
Positions of training points for all load cases studied in
Section 4.
Figure 13.
Positions of training points for all load cases studied in
Section 4.
Figure 14.
Predicted displacements under compressive (a-b), tensile (c-d), and bending (e-f) loads.
Figure 14.
Predicted displacements under compressive (a-b), tensile (c-d), and bending (e-f) loads.
Figure 15.
Error in predicted displacements under compressive (a-b), tensile (c-d), and bending (e-f) loads.
Figure 15.
Error in predicted displacements under compressive (a-b), tensile (c-d), and bending (e-f) loads.
Figure 16.
Predicted displacement and error in y-direction for 200x50 and 100x100 discretizations.
Figure 16.
Predicted displacement and error in y-direction for 200x50 and 100x100 discretizations.
Figure 17.
a) 1mm x 1mm plate subjected to compressive loads, b) Training points used for DEM.
Figure 17.
a) 1mm x 1mm plate subjected to compressive loads, b) Training points used for DEM.
Figure 18.
a-b) displacements obtained predicted from DEM, c-d) displacements obtained from FEA, e-f) error in displacements.
Figure 18.
a-b) displacements obtained predicted from DEM, c-d) displacements obtained from FEA, e-f) error in displacements.
Figure 19.
a-b) displacements obtained for 1 mm x 1 mm plate when DEM model is trained for a plate of dimensions 4 mm x 1 mm, c-d) displacements obtained from FEA, e-f) error in displacements.
Figure 19.
a-b) displacements obtained for 1 mm x 1 mm plate when DEM model is trained for a plate of dimensions 4 mm x 1 mm, c-d) displacements obtained from FEA, e-f) error in displacements.
Figure 20.
a) Boundary conditions of the plate under localized traction. b) Training points used for DEM.
Figure 20.
a) Boundary conditions of the plate under localized traction. b) Training points used for DEM.
Figure 21.
a-b) displacements predicted from DEM, c-d) displacements obtained from FEA, and e-f) error in displacements for localized traction.
Figure 21.
a-b) displacements predicted from DEM, c-d) displacements obtained from FEA, and e-f) error in displacements for localized traction.
Figure 22.
Density distribution. The passive elements are shown in black while the elements while the active elements are in white.
Figure 22.
Density distribution. The passive elements are shown in black while the elements while the active elements are in white.
Figure 23.
a-b) displacements obtained predicted from DEM, c-d) displacements obtained from FEA, and e-f) error in displacements.
Figure 23.
a-b) displacements obtained predicted from DEM, c-d) displacements obtained from FEA, and e-f) error in displacements.
Figure 24.
a-b) displacements obtained predicted from DEM without RFF mapping, c-d) displacements obtained from FEA, and e-f) error in displacements.
Figure 24.
a-b) displacements obtained predicted from DEM without RFF mapping, c-d) displacements obtained from FEA, and e-f) error in displacements.
Figure 25.
a-b) displacements obtained predicted from DEM without RFF mapping under bending load, c-d) displacements obtained from FEA, and e-f) error in displacements.
Figure 25.
a-b) displacements obtained predicted from DEM without RFF mapping under bending load, c-d) displacements obtained from FEA, and e-f) error in displacements.
Figure 26.
a-b) displacements obtained predicted from DEM without RFF mapping under compressive load, c-d) displacements obtained from FEA, and e-f) error in displacements.
Figure 26.
a-b) displacements obtained predicted from DEM without RFF mapping under compressive load, c-d) displacements obtained from FEA, and e-f) error in displacements.