A Discrete Physics-Informed Training for Projection-Based Reduced Order Models with Neural Networks

N. Sibuet; S. Ares de Parga; J.R. Bravo; R. Rossi

doi:10.20944/preprints202504.0645.v1

Submitted:

08 April 2025

Posted:

08 April 2025

You are already at the latest version

Abstract

This paper presents a physics-informed training framework for projection-based Reduced Order Models (ROMs). We extend the PROM-ANN architecture by complementing snapshot-based training with a FEM-based, discrete physics-informed residual loss, bridging the gap between traditional projection-based ROMs and physics-informed neural networks (PINNs). Unlike conventional PINNs that rely on analytical PDEs, our approach leverages FEM residuals to guide the learning of the ROM approximation manifold. Key contributions include: (1) a parameter-agnostic, discrete residual loss applicable to non-linear problems, (2) an architectural modification to PROM-ANN improving accuracy for fast-decaying singular values, and (3) an empirical study on the proposed physics informed training process for ROMs. The method is demonstrated on a non-linear hyperelasticity problem, simulating a rubber cantilever under multi-axial loads. The main accomplishment in regards to the proposed residual-based loss is its applicability on non-linear problems by interfacing with FEM software while maintaining reasonable training times. The modified PROM-ANN outperforms POD by orders of magnitude in snapshot reconstruction accuracy, while the original formulation is not able to learn a proper mapping for this use-case. Finally, the application of physics informed training in ANN-PROM modestly narrows the gap between data reconstruction and ROM accuracy, however it highlights the untapped potential of the proposed residual-driven optimization for future ROM development. This work underscores the critical role of FEM residuals in ROM construction and calls for further exploration on architectures beyond PROM-ANN.

Keywords:

reduced order model (ROM)

;

Physics Informed Neural Networks (PINNs)

;

artificial neural network (ANN)

;

Projection-Based Model Reduction

;

proper orthogonal decomposition (POD)

Subject:

Engineering - Industrial and Manufacturing Engineering

MSC: 65M60; 68T07

1. Introduction

In recent years, the development of increasingly sophisticated high-fidelity models (HFMs) is crucial for simulating complex physical phenomena. One of the main computational challenges in HFMs is the need for very fine spatio-temporal resolutions, which results in extremely high-dimensional problems that can take weeks to solve, even on numerous parallelized computing cores. Therefore, it is vital to create methods that significantly reduce the computational time and resources required to enable the application of these models in time-sensitive scenarios, including real-time control systems, simulation-driven optimization and digital twins [2,3].

One of the most extended approaches to dimensionality reduction is the use of Projection-Based Reduced-Order Models (ROMs) [4], which combine data-driven methods with the physics-based solving process of HFMs. These methods leverage pre-computed results from the HFM to define a low-dimensional latent space that captures the essential dynamics of the system. A common technique for generating this latent space is Proper Orthogonal Decomposition (POD) [5,6], which employs truncated singular value decomposition (SVD) in sampled data to produce an orthonormal basis that spans the latent space and also serves as the optimal linear mapping from the full-dimensional to latent space and back [7]. The process of collecting samples within a defined parametric space and applying POD constitutes the offline stage—an upfront computational investment. This enables the online stage, where new solutions within the parametric space can be computed at a fraction of the cost of running the HFM. During the online stage, solutions are found directly in the latent space by projecting the full-dimensional optimization problem onto another reduced space of the same dimensions as the latent space. The choice of projection basis depends on the problem’s properties, with common options including Galerkin projection [8], Least-Squares Petrov-Galerkin [9,10,11,12] projection and other derivates of Petrov-Galerkin [13].

Despite its utility, the reduction capability of POD is often limited by its linear nature, particularly in problems with slow decaying Kolmogorov n-width [14], such as advection-dominated problems. This limitation has encouraged the development of non-linear reduction techniques able to achieve more compact representations. The formal theory for generic non-linear projection-based ROM was introduced in [15], and it was specifically demonstrated by using a convolutional neural network as a mapping between the full and latent spaces. Despite the several limitations of this architecture, i.e. limited scaling capabilities and requirement of structured meshes, it paved the way for further non-linear architectures. These include methods based on quadratic manifolds [16] and the PROM-ANN architecture from [1]; the latter essentially being a non-linear approximation of POD using dense neural networks.

In this paper, we explore a new factor to take into account when constructing non-linear ROM operators. While existing projection-based ROMs construct their projection operators based exclusively on learning the reconstruction of solution snapshots, we propose incorporating the residuals of the HFM into the training of these operators. The reason being that, since the residual is the quantity to be be optimized during ROM inference, its more accurate representation will necessarily improve the solutions of the ROMs.

Our methodology is inspired by Physics-Informed Neural Networks (PINNs) [17] and their variants, e.g. [18,19]. PINNs are models based on neural networks that learn the solution function to a system of partial differential equations (PDEs) in its continuous form. They do so by incorporating the PDEs into their training loss and taking advantage of the automatic differentiation capabilities of common machine learning libraries. However, their application in engineering workflows can be hindered by their difficulties to cope with: irregular or discontinuous domains, complex PDEs due to spectral bias (which hinders their ability to handle high-frequency terms [20]), discontinuities (such as shock waves [21]), and plasticity (requiring significant effort to develop workarounds for these issues [22]).

In contrast, the Finite Element Method (FEM) [23], along with other HFMs like Finite Differences or Finite Volumes Methods, remains the gold standard for solving complex physical behaviors, particularly in engineering applications. This has led to recent efforts to bridge FEM and PINNs. For example, [24,25] proposed discrete PINNs that use FEM to compute residuals and their derivatives for backpropagation in linear problems. These approaches predict nodal values using neural networks that take simulation parameters as the only input. Meanwhile, the spatial discretization and boundary conditions are implicitly integrated via the FEM residual. While [25] focuses exclusively on structural mechanics, [24] generalizes the approach but still only for linear PDEs. Another intermediate approach is presented in [26], which introduces a physics-informed neural operator inference [27] framework that takes a discretized, variational form of the PDE as the training loss. This variational form is based on the energy of the system and is closely related to the FEM one, although not generalizable to the same degree. They remark how traditional PINNs only enforce the strong form of the PDE on the collocation points, resulting in lack of global smoothness, while a variational approach ensures this implicitly.

In this work, we propose integrating concepts from PINNs into the field of intrusive projection-based ROMs. Specifically, we propose a method to incorporate physics information into the ROM non-linear approximation manifold by using the FEM residual as the training loss. This requires two key components: (1) a flexible ROM architecture capable of accommodating custom loss functions for the definition of the projection operators, and (2) a framework to integrate FEM residuals into the training process. For the architecture, we adapt the PROM-ANN framework from [1,28], which uses neural networks in a scalable manner. We modify this architecture to suit our needs and develop a framework to integrate FEM residuals into a neural network’s loss, along with their Jacobians for the back-propagation. Our approach assumes the user has access to a fully functional FEM software capable of providing these quantities —a reasonable requirement given the need for an HFM to generate training data and perform intrusive ROM.

Our residual-based loss function differs from previous studies in its generality and flexibility. Instead of minimizing the FEM residual norm directly, we compute the difference between the obtained residual and a ground truth, thereby learning the behavior of the residual in non-converged conditions. We also provide an optimized implementation for computing the gradient of the residual loss, enabling efficient backpropagation.

To validate our methodology, we apply it to a steady-state structural mechanics problem involving a cantilever modeled by an unstructured mesh. The FEM software of choice is KratosMultiphysics [29], which we adapted for this purpose. Our results show that the modified PROM-ANN architecture, combined with the residual-based loss, achieves slightly but consistently improved accuracy in ROM simulations. We do acknowledge that training with the residual loss is computationally expensive compared to the traditional snapshot-based loss. To address this, we propose using residual training only as a fine-tuning step after initial snapshot-based training, significantly reducing overall training time.

While adapting the architecture of PROM-ANN to suit our specific needs, we also identified opportunities to enhance the original design, enabling it to learn effectively across a broader range of scenarios, even without incorporating the residual-based loss. These improvements are presented as additional contributions in this paper.

In summary, our contributions are threefold:

1.: Enhanced PROM-ANN Architecture: We modify the PROM-ANN framework to improve its learning capabilities, particularly in cases with fast singular value decay.
2.: Residual-Based Loss Function: We introduce a flexible, residual-based loss function that allows users to utilize their already-existing FEM infrastructure to perform physics-informed training of neural networks. We also specify an efficient implementation strategy for it.
3.: Exploration of the effects of physics-informed manifold on projection-based ROM: We explore the impact of training the decoder using a residual-based loss function, both in terms of pure data reconstruction and of online ROM simulation. We obtain encouraging results, particularly for the latter one.

Our findings suggest that focusing on residual behavior could unlock further potential in non-linear ROMs, particularly when combined with architectures specifically designed for this purpose.

The rest of the paper is organized as follows. Section 2 provides a review of the main methods that form the basis for this work, i.e. PINNs and projection-based ROM with an emphasis on the original PROM-ANN architecture. In Section 3 we propose a discrete, FEM-based residual loss and develop an adequate implementation strategy for it. Then, Section 4 provides a series of modifications to the original PROM-ANN architecture and loss that make it compatible with problems with fast-decaying singular values. Section 5 effectively merges the developments presented in the two previous sections to enable physics-informed non-linear ROM. Further on, Section 6 explains the specific FEM problem in which the developments will be tested, as well as the software of choice and the neural-network training strategy. Section 7 shows the results of applying the methods developed throughout the paper onto our specific use-case. Section 8 further discusses the implications of the results from the previous section and proposes further research directions for future improvements. Finally, 9 closes the paper with the most significant conclusions.

2. Background

2.1. Physics-Informed Neural Networks

In this section we introduce PINNs, highlighting the most relevant aspects to our case. Our focus is on PINNs for forward parametric problems. Comprehensive reviews on the numerous variations of PINNs can be found in e.g. [18,30]

Consider a physical problem in the following form

\{\begin{matrix} R (u (z; μ), μ) & = 0 & z \in Ω \times (0, T], \\ B (u (z; μ), z) & = g (z, μ) & z \in \partial Ω \times (0, T], \\ I (u (z; μ), z) & = f (z, μ) & z \in Ω \times {0}, \end{matrix}

(1)

where

Ω \subset R^{d}

represents the spatial domain with boundary

\partial Ω

,

u : Ω \times R_{+} \times R^{p} \to R^{N_{dof}}

is the unknown solution,

z

is a spatio-temporal coordinates vector,

μ \in P \subset R^{p}

is the parameters vector encapsulating geometrical, material, and/or initial and boundary conditions.

R (\cdot)

is the differential operator in residual form encapsulating the physics.

B (\cdot)

is an operator indicating boundary conditions, and

I (\cdot)

is an operator indicating initial conditions.

In PINNs one uses a neural network for representing the solution function, as

u \approx N (z, μ; Θ),

(2)

where

N

denotes the neural network, and

Θ

encapsulates its learnable parameters. The optimal parameters for the neural network are obtained by minimizing a loss function:

Θ^{*} = \underset{Θ}{\arg \min} L (Θ),

(3)

where the loss function

L (Θ)

can be splitted into

L (Θ) = ω_{R} L_{R} (Θ) + ω_{d} L_{d} (Θ) .

(4)

Here,

ω_{d}

and

ω_{R}

are weighting factors that balance the contribution of the data-driven loss

L_{d}

and the residual loss

L_{R}

, respectively, to the overall loss function. The data-driven loss

L_{d}

is related to the mismatch between

m N_{d}

ground-truth data points

u_{i j}^{*}

, and the output of the PINN for the same spatio-temporal and parametric coordinates. This data-driven loss

L_{d}

reads

L_{d} = {MSE}_{d} = \frac{1}{m N_{d}} \sum_{j = 1}^{m} \sum_{i = 1}^{N_{d}} {∥N (z_{i}, μ_{j}; Θ) - u_{i j}^{*}∥}^{2},

(5)

where

N_{d}

stands for the number of collocation points considered in the spatio-temporal domain. Moreover, m represents the number of points considered in the parametric space

P

.

The residual loss

L_{R}

relates to the substitution in the residual operator, as defined in Eq. (1), of the unknown solution with its neural network approximation as defined in Eq. (2), and reads

L_{R} = {MSE}_{R} = \frac{1}{m N_{R}} \sum_{j = 1}^{m} \sum_{i = 1}^{N_{R}} {∥R (N (z_{i}, μ_{j}; Θ), μ_{j})∥}^{2},

(6)

where

N_{R}

represents the number of residual evaluation points, which might be different from the collocation points used in the data-driven loss. Admittedly, from the substitution of Eq. (2) into Eq. (1), two more terms could be considered, corresponding to the initial and boundary conditions. However, here we assume that the initial and boundary conditions have been embedded into the PINN, as in [31,32], therefore ensuring their compliance, and not entering in the loss function.

The fact that the PINN itself is a differentiable mapping from spatio-temporal coordinates to the corresponding solution means that gradients of the solution and temporal derivatives of it are directly computable within the deep learning software via auto-differentiation. This allows for the internal computation of the loss and its gradient.

2.2. Projection-Based Reduced Order Models

2.2.1. The Full Order Model (FOM)

In the initial formulation of our problem, we represented the physics using a continuous residual form, as shown in Eq. (1), revisited here for clarity:

R (u (z; μ); μ) = 0 z \in Ω \times (0, T] .

(7)

To adapt this continuous form for computational analysis, we discretize the domain using FEM. By applying FEM discretization to Eq. (7) and incorporating initial and boundary conditions, we arrive at a set of governing equations in the form of a

μ

-parametric operator

R : R^{N} \times P \to R^{N}

. This results in the following discrete representation:

R (u; μ) = 0,

(8)

where

u \in R^{N}

is the FOM solution vector containing the value of the solution on every degree of freedom of the spatial discretization, and

μ \in P \subset R^{p}

is the parameters vector encapsulating e.g. geometrical variations, material properties or boundary conditions.

Given an initial guess of the FOM solution at a step t, symbolically represented as

u_{t}^{0} (μ)

1, the actual solution corresponding to the step can be obtained via an iterative method (e.g. Newton’s) like

\begin{matrix} - J (u_{t}^{k} (μ); μ) δ u_{t}^{k} (μ) & = R (u_{t}^{k} (μ); μ) \end{matrix}

(9a)

\begin{matrix} u_{t}^{k + 1} (μ) & = u_{t}^{k} (μ) + δ u_{t}^{k} (μ) \end{matrix}

(9b)

where

J = \frac{\partial R}{\partial u}

is a jacobian matrix (or an approximation thereof), k is the current non-linear iteration, and

δ u_{t}^{k} \in R^{N}

is the solution state increment.

In practice, the FEM discretization can yield millions of degrees of freedom, making each nonlinear iteration prohibitively expensive. This challenge grows acute in scenarios demanding rapid solutions (e.g., design optimization, digital twins). To address this, various surrogate models have emerged, ranging from intrusive methods that directly manipulate the governing equations to non-intrusive black-box approaches. This work builds upon an intrusive, projection-based reduced-order model.

2.3. Manifold Projection-Based ROMs

Following the standard procedure introduced in [15] for non-linear projection-based ROMs (also known as manifold-based ROMs), we begin by approximating the FOM’s solution state variable

u^{*} (μ)

through a general decoder function defined as:

u^{*} (μ) \approx u (μ) = D_{u} (q (μ)),

(10)

where

D_{u} : R^{n} \to R^{N}

maps a reduced solution

q (μ)

in a latent space to a solution

u (μ)

in the full space, and

n ≪ N

.

Correspondingly, we introduce its encoder:

\begin{matrix} E_{u} : & R^{N} \to R^{n}, & u (μ) \mapsto q (μ) . \end{matrix}

(11)

When Eq. (10) is substituted into Eq. (8), we obtain a residual characterized by:

R (u (μ); μ) = R (D_{u} (q (μ)); μ) .

(12)

Finding the best reduced solution

q

involves solving the following minimization problem:

min_{q (μ) \in R^{n}} \frac{1}{2} {∥R (D_{u} (q (μ)); μ)∥}_{G}^{2} .

(13)

Particular choices of the norm-defining matrix

G \in R^{N \times N}

lead to different methods. In particular setting

G : = J^{- T}

leads to the Galerkin method, which we utilize in this work. For the derivation of the Galerkin method and other methods by particular choices of

G

, the interested reader is directed to [13].

In this way, given an initial guess for the ROM solution, symbolically represented as

u^{0} (μ) \in R^{N}

, the ROM solution can be obtained using the Galerkin method by solving iteratively the following system of equations

\begin{matrix} {(\frac{\partial D_{u} (q^{k} (μ))}{\partial q})}^{T} J (u^{k} (μ); μ) \frac{\partial D_{u} (q^{k} (μ))}{\partial q} δ q^{k} (μ) & = - {(\frac{\partial D_{u} (q^{k} (μ))}{\partial q})}^{T} R (u^{k} (μ); μ) \end{matrix}

(14a)

\begin{matrix} q^{k + 1} (μ) & = q^{k} (μ) + δ q^{k} (μ) \end{matrix}

(14b)

\begin{matrix} q^{0} (μ) & = E_{u} (u^{0} (μ)) \end{matrix}

(14c)

2.4. Neural Network-Augmented Projection-Based ROM (PROM-ANN)

In this work, we focus on the PROM-ANN proposed by Barnett et al.[1]. This methodology constructs a nonlinear approximation manifold by augmenting a traditional linear reduced-order basis ROB with a nonlinear correction obtained via an artificial neural network (ANN). It builds upon earlier developments in quadratic approximation manifolds[16], addressing limitations inherent to purely linear ROM approximations, especially regarding problems exhibiting slowly decaying Kolmogorov n-width.

2.4.1. Construction of the Nonlinear Approximation Manifold

The PROM-ANN methodology constructs the solution manifold by leveraging a low-dimensional primary basis and enhancing it with a nonlinear mapping learned from data. Specifically, POD is employed by factorizing a snapshot matrix

S_{u} \in R^{N \times m}

(containing m high-fidelity solutions) via a truncated singular value decomposition:

S_{u} = u Σ V^{T},

(15)

where

u \in R^{N \times (n + \bar{n})}

contains the left singular vectors, and

Σ

is the diagonal matrix of singular values

σ_{i}

, ordered by magnitude. From this decomposition, we construct the primary ROB

Φ \in R^{N \times n}

using the first n dominant modes, and a secondary ROB

\bar{Φ} \in R^{N \times \bar{n}}

using the next

\bar{n}

subdominant modes. Given a full-order solution vector

u^{*} (μ) \in R^{N}

, we approximate it with a nonlinear manifold representation defined as follows:

\begin{matrix} u^{*} (μ) \approx u (μ) & = D_{u} (q (μ)) = u_{ref} + Φ q (μ) + \bar{Φ} N (q (μ); Θ) \end{matrix}

(16a)

\begin{matrix} q (μ) & = E_{u} (u^{*} (μ)) = Φ^{T} (u^{*} (μ) - u_{ref}), \end{matrix}

(16b)

where

u_{ref} \in R^{N}

is a suitable reference solution, typically chosen as the mean of the snapshots,

q (μ) \in R^{n}

are the primary reduced coordinates, encoding the dominant features of the solution,

N (q; Θ) : R^{n} \to R^{\bar{n}}

is a neural network with parameters

Θ

, creating a nonlinear mapping from the primary to the secondary reduced coordinates, thus providing a nonlinear correction to the linear subspace approximation.

The ANN acts as a nonlinear map, capturing correlations between dominant (primary) and subdominant (secondary) mode coefficients. Rather than explicitly reconstructing truncated information, the ANN provides a nonlinear relationship that significantly improves the compactness and accuracy of the reduced-order representation.

To solve the reduced-order system defined in Eq. (14), it is necessary to explicitly compute the derivative of the decoder function with respect to the primary reduced coordinates. For the PROM-ANN architecture defined in Eq. (16), this derivative reads:

\frac{\partial D_{u} (q (μ))}{\partial q} = Φ + \bar{Φ} \frac{\partial N (q (μ); Θ)}{\partial q} .

(17)

The derivative of the neural network term

\frac{\partial N (q (μ); Θ)}{\partial q}

can be computed efficiently using automatic differentiation frameworks typically employed in neural network training.

2.4.2. Training the Neural Network

The ANN is trained using precomputed solution snapshots from the high-fidelity model. For each training snapshot indexed by

j = 1, \dots, m

, we first compute the primary and secondary reduced coordinates by projection onto their respective ROBs:

\begin{matrix} q_{j} (μ_{j}) & = Φ^{T} (u_{j}^{*} (μ_{j}) - u_{ref}), \end{matrix}

(18a)

\begin{matrix} {\bar{q}}_{j} (μ_{j}) & = {\bar{Φ}}^{T} (u_{j}^{*} (μ_{j}) - u_{ref}) . \end{matrix}

(18b)

The original PROM-ANN approach [1] defines the training loss function directly on the discrepancy between the secondary coordinates

{\bar{q}}_{j}

and the ANN prediction

N (q_{j}; Θ)

:

L_{PROM - ANN} = \frac{1}{m} \sum_{j = 1}^{m} {∥ {\bar{q}}_{j} (μ_{j}) - N (q_{j} (μ_{j}); Θ) ∥}^{2} .

(19)

While very computationally efficient, this approach lacks a proper normalization strategy in order to guarantee that the neural network can be effectively trained. We will study this issue in Section 4.

3. Discrete PINN-like Loss

Our non-linear ROM architecture aims to incorporate the physics of the problem into the approximation manifold construction. The equivalent effect is accomplished in traditional PINNs by auto-differentiating the neural network’s output with regard to a given point in continuous space and time, using the strong form of the PDE system. In contrast, a discrete approach like ours, requires a numerical approximation by discretization of the PDE, which can be done using a variety of techniques such as the finite element, or finite volume methods. The integration of these discretized residuals into neural networks places our approach within the category of informed machine learning, as detailed by [33].

The proposal to use discrete approximation for substituting the auto-differentiation in PINNs for residual minimization is introduced almost simultaneously in [24,25]. Both of these approaches introduce NN architectures for solving forward problems of linear, steady-state simulation and [24] additionally presents a model for backwards problems of the same nature. Their conceptualization doesn’t differ too much from classical PINNs in terms of inputs, outputs, and loss definition: they take the parameters vector for the desired simulation as input, and return the results of the nodal variables of the system as output. The spatial information that is typically an input in PINNs is intrinsically defined in the FEM solver at the time of the residual computation.

In terms of the training loss, both approaches aim to minimize the mean squared L2-norm of the residual, as determined by the FEM solver. This minimization considers the predicted snapshot (nodal variables) and the relevant simulation parameters:

L_{R} = \frac{1}{m N} \sum_{j = 1}^{m} {∥R (N (μ_{j}; Θ); μ_{j})∥}^{2} .

(20)

where

Θ

are the trainable parameters of the neural network conforming the PINN, index j specifies the different samples to be used within the batch, m is the number of samples in the batch and N is the number of degrees of freedom in our system (and therefore the size of the residual). An important remark about this loss is that it requires the removal of contributions in the residual

R (N (μ_{j}; Θ); μ_{j})

at the degrees of freedom with Dirichlet conditions, otherwise the norm will not approach zero.

The exact implementation of this loss differs in both approaches. The one in [25] is limited to static linear cases of structural mechanics and proposes a specific loss scaling strategy for these cases. They also propose an implementation strategy to perform the loss computation in batches. Meanwhile, the approach in [24] is more general, but still only applicable to linear PDEs. This limitation to linear cases significantly simplifies the loss implementation, as the residual becomes of the form

R (u; μ) = K (μ) u - F (μ)

. So both the stiffness matrix

K

and the vector of external contributions

F

are independent of the current snapshot. Accordingly, the loss becomes:

L_{R} = \frac{1}{m N} \sum_{j = 1}^{m} {∥K (μ_{j}) N (μ_{j}; Θ) - F (μ_{j})∥}^{2}

(21)

where one could have precomputed all

K (μ_{j})

and

F (μ_{j})

, thus allowing the whole optimization to be done via auto-differentiation, without calling the FEM software.

Even the approaches in more recent papers like [34,35] are still designed only for linear problems, with the difference being their specific use cases, the architectures they use for the surrogate model and the loss being just the norm of the residual, instead of the square of it.

In contrast, our approach differs in two key aspects: (1) the loss function is formulated in a parameter-agnostic manner, and (2) it is designed to handle both linear and non-linear cases, thereby enabling a much broader range of applications. As with PINNs, our method also allows for a combination of physics-based and data-driven losses. The following subsections discuss these three developments.

3.1. Parameter-Agnostic Loss

One aspect in which our methodology diverges significantly from the original idea of PINNs is that we are not training a self-contained surrogate model (typically in the form of a single neural network

N (μ_{j})

) but instead, the approximation manifold defined by a trainable encoder-decoder pair

D_{u} (E_{u} (u_{j}^{*}))

. As we can see, the latter case does not take the simulation parameters

μ_{j}

as an input, so it is agnostic to them. This is because the FEM-based ROM software itself will be in charge of conditioning the simulation with those parameters and then finding the most appropriate solution within our latent space.

But not only that. Traditionally, for a loss like the one in Eq. (20), we need to specify the exact parameters

μ_{j}

for the solution we are seeking, as that is the one that will yield a residual of zero and therefore enable the learning by minimization. Instead, we design a loss function in which the parameters applied in the FEM software can be arbitrary. In this new loss we are not merely minimizing the properly parameterized residual of the predicted quantity

D_{u} (E_{u} (u_{j}^{*}))

, but the difference between this quantity and the residual of the target solution

u_{j}^{*}

, both with constant parameter

μ

:

L_{R} = \frac{1}{m N} \sum_{j = 1}^{m} {∥R (D_{u} (E_{u} (u_{j}^{*})); μ) - R (u_{j}^{*}; μ)∥}^{2} .

(22)

In this case, the trainable parameters

Θ

are contained within the encoder and/or decoder of our ROM, and their specific form depends on the chosen architecture. Strictly speaking, we should write

D_{u} (E_{u} (u_{j}^{*}; Θ); Θ)

. However, for clarity and conciseness, we will omit

Θ

from the notation in what follows.

Our approach aims to minimize the discrepancy in residual behavior when it is non-zero, ensuring the ROM approximation manifold captures not only the converged solution but also the behavior of the residual outside of convergence situations. In our approach, the decoder will be readily integrated into the Newton iterative procedure. Thus, we aim for it to enhance its residual representation near the convergence space, aiding in achieving an optimal converged solution. On the same line, we no longer have the restriction that we mentioned for Eq. (20) on the components of the residual associated to Dirichlet conditions. Meaning that we can leave those components in order to learn them too. The main caveat of using this approach is that it still requires the original HFM’s snapshots samples

u_{j}^{*}

even when training via the residual. In our specific case, the encoder-decoder architecture of choice will need these snapshots regardless, so it is not a major inconvenient.

Something important to consider is that Dirichlet conditions must be imposed strongly in the solutions vector whenever we compute the residual. That means that whatever architecture we use as encoder-decoder pair, it needs to operate on and predict only the degrees of freedom unaffected by Dirichlet conditions. The fixed ones are given their corresponding value forcefully.

3.2. In-Training Integration of FEM Software

The obstacle impeding going from linear to non-linear cases in other FEM-based residual losses [24,25] is not a theoretical or mathematical difficulty in generating the residuals themselves or their derivatives. We have extensive FEM software at our disposal that already does precisely this. The difficulty is in dynamically integrating this information in an effective way during the training of the decoder.

Let us study the loss we proposed in Eq. (22) in more detail to identify our needs during training. The loss itself needs the values for two residuals:

R (D_{u} (E_{u} (u_{j}^{*})); μ)

and

R (u_{j}^{*}; μ)

. The latter can be pre-computed, as it will be constant for sample j during the whole training. The first one, however, has to be computed within the FEM software with an updated snapshot value at each training step.

Now, the actual training of the model happens during the backpropagation of the loss, which necessitates calculating the gradient of the loss with respect to

Θ

. This is typically done automatically thanks to the auto-differentiation capabilities of deep learning frameworks. However, as we take part of the loss computation onto external software, we are unable to do this and we have to code the gradient manually. The derivative of the loss function

L_{R}

can be decomposed using the chain rule:

\frac{\partial L_{R}}{\partial Θ} = \frac{2}{m N} \sum_{j = 1}^{m} {(R (u_{j}; μ) - R (u_{j}^{*}; μ))}^{T} \frac{\partial R (u_{j}; μ)}{\partial u_{j}} \frac{\partial u_{j}}{\partial Θ} .

(23)

where

u_{j} = D_{u} (E_{u} (u_{j}^{*}))

is the current prediction for the nodal values of the solution.

The factor

\frac{\partial R (u_{j}; μ)}{\partial u_{j}}

is the Jacobian matrix of the residual function with respect to the predicted solution vector

u_{j}

. This Jacobian is computed and assembled entirely in the FEM software, given the current snapshot and arbitrary simulation parameter. Within the context of FEM, it is expressed as:

J_{F} (u_{j}; μ) = \frac{\partial R (u_{j}; μ)}{\partial u_{j}} .

(24)

The final factor

\frac{\partial u_{j}}{\partial Θ}

represents another Jacobian, this time of the predicted snapshot with respect to the trainable parameters. We can express it as:

J_{D} (u_{j}; μ) = \frac{\partial u_{j}}{\partial Θ} .

(25)

This latter derivative is applied only on the computations performed in the trainable encoder-decoder pair

D_{u} (E_{u} (u_{j}^{*}))

. As such, this factor is self-contained within the deep learning framework and does not require additional external data. Such a Jacobian could be obtained by using the tf.GradientTape.batch_jacobian() method in TensorFlow, or equivalent methodologies in other deep learning frameworks.

To compute the loss gradient as it is defined explicitly in Eq. (23), we just need to collect the factors coming from the FEM software into the deep learning framework. However, this computation would be unnecessarily inefficient because of two main reasons described next. For each of these, we propose a corresponding implementation strategy:

1.: First, take into account that $J_{F} (u_{j}; μ)$ is typically a high-dimensional and highly sparse matrix. So, moving it directly from one software to another could be costly, especially if it cannot be treated as a sparse structure in all computations. FEM software is typically designed to cope with these kinds of matrices and to perform sparse vector-matrix multiplications. Therefore, for each snapshot we precompute the quantities:

$\begin{matrix} e_{R, j} {(R (u_{j}; μ) - R (u_{j}^{*}; μ))}^{T} \\ v_{R, j} e_{R, j} J_{F} (u_{j}; μ) \end{matrix}$

(26)

within the FEM software. Now both of these are vectorial quantities, making the transfer to the deep learning framework more efficient. $e_{R, j}$ is the error used to compute the loss, as $L_{R} = \frac{1}{m N} \sum_{j = 1}^{m} {∥e_{R, j}∥}^{2}$ . And $v_{R, j}$ will be used to compute the gradient of the loss.
2.: Then, the main bottleneck comes from needing to compute m full Jacobian matrices ${J_{D} (u_{j}; μ)}_{j = 1}^{m}$ via auto-differentiation, instead of just the gradient of a single scalar $L_{R}$ . Avoiding this is straightforward when reformulating the gradient of the loss with vectors $v_{R, j}$ . We realize that, to the deep learning framework, externally-computed $v_{R, j}$ is no longer seen as a function depending on the model parameters $Θ$ , but as a constant instead. This means that we can treat it as such during the computation and get:

$\frac{\partial L_{R}}{\partial Θ} = \frac{2}{m N} \sum_{j = 1}^{m} v_{R, j} \frac{\partial u_{j}}{\partial Θ} = \frac{\partial}{\partial Θ} (\frac{2}{m N} \sum_{j = 1}^{m} v_{R, j} u_{j})$

(27)

Therefore, we compute a single gradient of a scalar value for the whole batch, via auto-differentiation.

Particularly in our case, we take KratosMultiphysics as our choice of FEM software. As it is open-source, it enables us to develop all the custom functionalities needed to take a given set of snapshots

{u_{j}}_{j = 1}^{m}

(those of the current batch), and for each one apply it as the current solution to then output

e_{R, j}

and

v_{R, j}

.

3.3. Data-Based Loss Term

Even when training with the residual loss, it might be beneficial to introduce a data-related loss simultaneously. This is common in traditional PINNs in two ways: specifically on the boundary and initial conditions, to try to enforce Dirichlet conditions [17] (unless a special approach is used to establish them strongly [32]), and on collocation points sampled inside the simulation space, as a regularization that can help the overall loss converge faster [18].

In our case the Dirichlet conditions are enforced strongly and managed by the FEM software for their effect on the residual. But we could still introduce such a loss term for regularization purposes. Thus, the total loss function would be defined by two components:

L = ω_{R} L_{R} + ω_{d} L_{d} .

(28)

where

ω_{R}

and

ω_{d}

are tunable hyper-parameters to balance both terms as needed, and the data-related loss

L_{d}

is defined simply as the mean squared error between the currently predicted snapshot

u_{j} = D_{u} (E_{u} (u_{j}^{*}))

and the ground-truth one

u_{j}^{*}

:

L_{d} = \frac{1}{m N} \sum_{j = 1}^{m} {∥u_{j} - u_{j}^{*}∥}^{2} .

(29)

Then, the implementation of the total loss

L

and its gradient starts by defining the following vectors as constants:

\begin{matrix} e_{d, j} = v_{d, j} & {(u_{j} - u_{j}^{*})}^{T}, \\ e_{R, j} & {(R (u_{j}; μ) - R (u_{j}^{*}; μ))}^{T}, \\ v_{R, j} & e_{R, j} J_{F} (u_{j}; μ), \end{matrix}

(30)

and then using these to compute the loss as

L = \frac{1}{m N} \sum_{j = 1}^{m} (ω_{R} {∥e_{R, j}∥}^{2} + ω_{d} {∥e_{d, j}∥}^{2})

(31)

And finally its gradient via auto-differentiation:

\frac{\partial L}{\partial Θ} = \frac{\partial}{\partial Θ} (\frac{2}{m N} \sum_{j = 1}^{m} (ω_{R} v_{R, j} + ω_{d} v_{d, j}) u_{j}) .

(32)

4. Modifications to the PROM-ANN Architecture

We adopt the PROM-ANN architecture from [1] as the foundation for developing our physics-informed ROM model, which introduces the use of neural networks while ensuring scalability and avoiding the requirement of a structured mesh for the underlying simulation. The architecture was introduced in Section 2.4, where we noted that it can be interpreted as a non-linear extension of classical POD that introduces additional effective modes without increasing the size of the latent space.

However, we find that the architecture, as originally proposed, is difficult to train—at least in our case, where the underlying problem exhibits a rapidly decaying singular value spectrum. To address this, we first revise both the architecture and the data-driven loss formulation to improve training capability. Later, in the following section, we will then incorporate our residual-based loss into the framework.

The modifications we propose in this section are as follows: (1) scaling of reduced coefficients, (2) changing the data-based loss for one that takes the full snapshot into account and (3) properly scaling the loss

4.1. Scaling of Reducedcoefficients

The main concern for us from the original architecture is that there is no normalization or scaling of the reduced coefficients prior to using them in the neural network. Ideally, neural networks are designed to operate on inputs that are approximately independent and identically distributed (i.i.d.), as this assumption facilitates more effective learning and optimization [36,37]. This can rarely be guaranteed, but it is still common practice to perform some sort of normalization or scaling on the inputs so that they all operate in a similar range of values. Otherwise, serious issues may arise during training, mainly because of some of the inputs being ignored.

In the case of PROM-ANN, the inputs and outputs of the neural network are the coefficients for a given snapshot (remember from Section 2.4 that we train a network such that

N (Φ^{T} u_{j}^{*}) \approx {\bar{Φ}}^{T} u_{j}^{*}

). These coefficients

q_{j}^{*} = Φ^{T} u_{j}^{*}

and

{\bar{q}}_{j}^{*} = {\bar{Φ}}^{T} u_{j}^{*}

will scale similarly to the singular value decay observed when performing SVD on the training data. Essentially, for a fast-decaying singular value profile, each coefficient will result in a considerably smaller range of coefficients than the previous one, causing the neural network to learn only from a few of the first ones.

In order to correct this issue, we modify the architecture of the encoder and the decoder themselves to include a pair of scaling matrices:

\begin{matrix} E_{u} (u^{*}) = q = Ξ^{- 1} Φ^{T} u^{*} \\ D_{u} (q) = Φ Ξ q + \bar{Φ} \bar{Ξ} N (q) \end{matrix}

(33)

where both the projection matrices

Φ

,

\bar{Φ}

and the scaling matrices

Ξ

,

\bar{Ξ}

come from the SVD decomposition of the snapshots matrix:

\begin{matrix} S V D (S_{u}) = U Σ V^{T} \\ Φ = [U_{1} | . . . | U_{n}], \bar{Φ} = [U_{n + 1} | . . . | U_{\bar{n}}] \\ Ξ = \frac{1}{\sqrt{M}} [\begin{matrix} σ_{1} \\ σ_{2} \\ . . . \\ σ_{n} \end{matrix}], \bar{Ξ} = \frac{1}{\sqrt{M}} [\begin{matrix} σ_{n + 1} \\ σ_{n + 2} \\ . . . \\ σ_{\bar{n}} \end{matrix}] \end{matrix}

(34)

and

σ_{i}

are the singular values found in the diagonal of the

Σ

matrix. Finally, M is the total number of samples in

S_{u}

.

This procedure effectively re-scales each coefficient to have a roughly equivalent range. In the ideal case where all rows of the training snapshot matrix

S_{u}

have zero mean, multiplying by the matrix

{\bar{Ξ}}^{- 1}

scales the quantity

Q = {\bar{Ξ}}^{- 1} U^{T} S_{u}

such that the covariance matrix between its rows becomes the identity. In other words, the modes in

Q

are uncorrelated—already ensured by the projection onto

Φ

—and have unit variance.

In our case, the condition that all row means in

S_{u}

are exactly zero does not hold. Nevertheless, as we will show in Section 6, the resulting scaling still leads to reduced coefficients with similar magnitudes in our particular use case.

We intentionally avoid enforcing zero-mean values, as we do not wish to apply any offset to the projected values. This design choice is crucial because it enables a neural network architecture without biases, thereby ensuring that

D_{u} (E_{u} (0)) = 0

holds. Additionally, the proposed scaling procedure preserves the orthogonality of the projection matrices, so we still have

E_{u} (D_{u} (q)) = q

.

Finally, no additional computation is required to obtain the scaling factors, since the SVD is already performed as part of the original methodology. In this sense, the approach is highly efficient.

4.2. Corrected Data-Based Loss

Now we have an architecture that provides appropriately-scaled features for the neural network. But at the same time, this scaling may not be the best if we want to apply the training loss as in the original paper[1]. That is, measuring the error on the predicted

{\bar{q}}_{j}

coefficients themselves. This loss would translate like this to our architecture:

L_{d}^{'} = \frac{1}{m N} \sum_{j = 1}^{m} {∥N (q_{j}^{*}) - {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} u_{j}^{*}∥}^{2} = \frac{1}{m N} \sum_{l = 1}^{m} {∥N (Ξ^{- 1} Φ^{T} u_{j}^{*}) - {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} u_{j}^{*}∥}^{2}

(35)

The rationale behind dividing by N is to ensure consistency with the loss formulations introduced in Section 3.

Such a loss will give the same importance to all of the output features, or reduced coefficients in this case. We know from the nature of POD that this is not desirable, as lower modes should be given more importance than higher ones. In fact what we have to do is to reverse, within the loss, the scaling that we applied to the reduced coefficients. We can do so by applying

\bar{Ξ}

on the contents of the norm in Eq. (35):

L_{d, compact} = \frac{1}{m N} \sum_{l = 1}^{m} {∥\bar{Ξ} (N (q^{*}) - {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} u^{*})∥}^{2}

(36)

But we could also achieve the same effect by just enforcing the whole reconstructed full-order snapshot to approximate the ground-truth one. That is, using exactly the same data-based loss that we had defined in Section 3.3:

L_{d} = \frac{1}{m N} \sum_{j = 1}^{m} {∥u_{j} - u_{j}^{*}∥}^{2} = \frac{1}{m N} \sum_{j = 1}^{m} {∥Φ Ξ q^{*} + \bar{Φ} \bar{Ξ} N (q^{*}) - u_{j}^{*}∥}^{2}

(37)

This second loss is more intuitive, as we make it explicit to achieve the final goal for our ROM approximation manifold: learning the full snapshot itself. However, it is not difficult to prove that both Eq. (36) and Eq. (37) are equivalent up to a constant offset. That is, assuming that all snapshots

{u_{j}^{*}}_{j = 1}^{m}

were included in the set

S_{u}

to which we performed the SVD.

Proof.

By developing from

L_{d}

:

L_{d} = \frac{1}{m N} \sum_{j = 1}^{m} {∥u_{j} - u_{j}^{*}∥}^{2} = \frac{1}{m N} \sum_{j = 1}^{m} {∥Φ Ξ q^{*} + \bar{Φ} \bar{Ξ} N (q^{*}) - u_{j}^{*}∥}^{2}

By developing

u_{j}^{*} = (Φ Ξ Ξ^{- 1} Φ^{T} + \bar{Φ} \bar{Ξ} {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} + \hat{Φ} \hat{Ξ} {\hat{Ξ}}^{- 1} {\hat{Φ}}^{T}) u_{j}^{*}

. Where

\hat{Φ}

is the orthonormal basis containing the singular vectors from

S V D (S_{u})

that were not included in neither

Φ

or

\bar{Φ}

, and

\hat{Ξ}

is analog but for the singular values matrix. Then applying it:

\begin{matrix} L_{d} = \frac{1}{m N} \sum_{j = 1}^{m} {∥Φ Ξ q^{*} + \bar{Φ} \bar{Ξ} N (q^{*}) - (Φ Ξ q^{*} + \bar{Φ} \bar{Ξ} {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} u_{j}^{*} + \hat{Φ} \hat{Ξ} {\hat{Ξ}}^{- 1} {\hat{Φ}}^{T} u_{j}^{*})∥}^{2} \\ = \frac{1}{m N} \sum_{j = 1}^{m} {∥\bar{Φ} \bar{Ξ} (N (q^{*}) - {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} u_{j}^{*}) - \hat{Φ} {\hat{Φ}}^{T} u_{j}^{*}∥}^{2} \end{matrix}

By acknowledging that

\bar{Φ}

and

\hat{Φ}

are orthogonal to each other:

L_{d} = \frac{1}{m N} \sum_{j = 1}^{m} ({∥\bar{Φ} \bar{Ξ} (N (q^{*}) - {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} u_{j}^{*})∥}^{2} + {∥\hat{Φ} {\hat{Φ}}^{T} u_{j}^{*}∥}^{2})

By acknowledging that the

L^{2}

norm is invariant to right-multiplication of an orthonormal matrix and that the terms

∥\hat{Φ} {\hat{Φ}}^{T} u_{j}^{*}∥

do not depend on trainable parameters

Θ

:

L_{d} = \frac{1}{m N} \sum_{j = 1}^{m} {∥\bar{Ξ} (N (q^{*}) - {\bar{Ξ}}^{- 1} {\bar{Φ}}^{T} u_{j}^{*})∥}^{2} + C = L_{d, compact} + C

□

If the user does not intend on applying physics-informed training, then choosing

L_{d, compact}

would make most sense because of its reduced complexity. However, the fact that

L_{d}

takes the snapshots to the full space within the computation makes it most appropriate to be paired with a physics-based loss, which will require the full vector of nodal solutions anyways. In any of the two cases, the ideal would be to pre-compute

q_{j}^{*}

for all samples to improve efficiency.

4.3. Scaling of the Data-Based Loss

As written in Eq. (37),

L_{d}

would exhibit high variability depending on the number of modes included in the primary ROB. That is because, by including more modes in

Φ

while following their importance order, we are heavily limiting the possible error

{∥u_{j} - u_{j}^{*}∥}^{2}

that we can have 2.

In order to correct this issue, we propose to use a pre-computed scaling factor that will be applied globally to the loss:

L_{d} = \frac{1}{m N} \frac{1}{e_{POD, d}} \sum_{j = 1}^{m} {∥u_{j} - u_{j}^{*}∥}^{2} .

(38)

With the new factor

e_{POD, d}

defined as:

e_{POD, d} \frac{1}{M N} \sum_{j = 1}^{M} {∥Φ Φ^{T} u_{j}^{*} - u_{j}^{*}∥}^{2} .

(39)

This quantity corresponds to the mean squared reconstruction error of the standard POD approach when only the primary modes are retained. It is averaged over all the samples in the training set, not only those in the batch, in order to make it representative for any choice of samples during the training, and thus, only having to compute it once at the beginning.

This choice of scaling also gives a much more interpretable value to the loss, as it will become an indicator to how much better the current model is compared to just using POD for the same size of latent space. In this sense, it is helpful when troubleshooting, as a value over 1 would clearly indicate that we are losing accuracy compared to POD.

4.4. Online Phase of Non-Linear ROM

In order to perform the online ROM simulation, we need to adapt the nonlinear iteration problem to the new architecture. Essentially, we substitute our decoder into the Galerkin-based non-linear iteration formulation for non-linear manifolds (as seen in Eq. (14)). The result would be:

\begin{matrix} {(Φ Ξ + \bar{Φ} \bar{Ξ} \frac{\partial N (q^{k})}{\partial q})}^{T} J (u (μ); μ) (Φ Ξ + \bar{Φ} \bar{Ξ} \frac{\partial N (q^{k})}{\partial q}) δ q^{k} (μ) = - {(Φ Ξ + \bar{Φ} \bar{Ξ} \frac{\partial N (q^{k})}{\partial q})}^{T} R (u (μ); μ) \\ q^{k + 1} (μ) = q^{k} (μ) + δ q^{k} (μ) \end{matrix}

(40)

Meaning that at each non-linear iteration we have to recompute

\bar{Φ} \bar{Ξ} \frac{\partial N (q^{k})}{\partial q}

.

The online ROM simulation is done entirely within the FEM software, KratosMultiphysics, as going back between this and the deep learning software would be unnecessarily expensive. Therefore, we made custom methods within KratosMultiphysics to compute both the forward pass of the neural network,

N (q)

, and the jacobian of its outputs with respect to the inputs,

\frac{\partial N (q)}{\partial q}

. The other quantities are defined at initialization from the training results and are kept constant.

5. Integration of Physics-Based Loss into PROM-ANN

Here we take the developments done in the two last sections and join them in order to finally obtain a method to perform physics-informed non-linear projection-based ROM. Essentially, we will take the total

L

loss developed in Section 3 and apply it on the PROM-ANN formulation developed in Section 4, while keeping in mind the adaptations to the loss that we also performed in this latter section.

The full discrete PINN-like loss

L^{'}

, as formulated in Section 3 is:

L^{'} = ω_{R} L_{R}^{'} + ω_{d} L_{d}^{'} = \frac{ω_{R}}{m N} \sum_{j = 1}^{m} {∥R (u_{j}; μ) - R (u_{j}^{*}; μ)∥}^{2} + \frac{ω_{d}}{m N} \sum_{j = 1}^{m} {∥u_{j} - u_{j}^{*}∥}^{2} .

(41)

In Section 4.3 we already determined that a full-snapshot data-based loss like

L_{d}^{'}

in Eq. (41) is appropriate to use once scaled by factor

1 / e_{POD, d}

. So that is the only modification we have to do on

L_{d}^{'}

.

But now, in order to still be able to use the residual-based loss term

L_{R}^{'}

appropriately, we should also scale it in a similar way to

L_{d}^{'}

. We propose modifying it as follows:

L_{R} = \frac{1}{m N} \frac{1}{e_{POD, R}} \sum_{j = 1}^{m} {∥R (u_{j}; μ) - R (u_{j}^{*}; μ)∥}^{2} .

(42)

With

e_{POD, R}

defined as the mean square error compared to the traditional POD, but this time in terms of the residual and not the snapshot itself:

e_{POD, R} = \frac{1}{M N} \sum_{j = 1}^{M} {∥R (Φ Φ^{T} u_{j}^{*}; μ) - R (u_{j}^{*}; μ)∥}^{2} .

(43)

With this small modification, the full loss becomes:

L = ω_{R} L_{R} + ω_{d} L_{d} = \frac{ω_{R}}{m N} \frac{1}{e_{POD, R}} \sum_{j = 1}^{m} {∥R (u_{j}; μ) - R (u_{j}^{*}; μ)∥}^{2} + \frac{ω_{d}}{m N} \frac{1}{e_{POD, d}} \sum_{j = 1}^{m} {∥u_{j} - u_{j}^{*}∥}^{2} .

(44)

And the practical implementation of the loss itself and its gradient is:

\begin{matrix} L = \frac{1}{m N} \sum_{j = 1}^{m} (\frac{ω_{R}}{e_{POD, R}} {∥e_{R, j}∥}^{2} + \frac{ω_{d}}{e_{POD, d}} {∥e_{d, j}∥}^{2}), \\ \frac{\partial L}{\partial Θ} = \frac{\partial}{\partial Θ} (\frac{2}{m N} \sum_{j = 1}^{m} (\frac{ω_{R}}{e_{POD, R}} v_{R, j} + \frac{ω_{d}}{e_{POD, d}} v_{d, j}) u_{j}), \end{matrix}

(45)

where all vector quantities

e_{R, j}

,

e_{d, j}

,

v_{R, j}

and

v_{d, j}

are exactly as defined in Eq. (30) and treated as constants. And the definitions of the encoder and decoder are taken from Eq. (33).

As we only modified the loss for the offline stage of ROM, the online stage stays untouched. Therefore, once the decoder is trained, we would proceed exactly as in Section 4.4 with new cases to simulate.

6. Use-Case and Evaluation Methodology

The evaluation of our method is done on a quasi-static structural mechanics case, specifically a non-linear case of hyperelasticity. It simulates the deformation in a 2D rubber cantilever that is fixed at its left wall and which has two different and perpendicularly-oriented line loads,

P_{x}

and

P_{y}

, applied to its right end. The cantilever is defined by an unstructured mesh with 797 nodes, and the nodal variables to compute are the displacements in components X and Y. Therefore, our FOM system is of dimension

N = 1594

. Figure 1 shows a schematic of the described setup.

We define the parametric space as the range of loads that can be applied in each direction:

P = [- 3000, 3000] \times [- 3000, 3000]

N/m. And the displacements are computed in reference to the unperturbed position of each node

u (P_{x} = 0, P_{y} = 0)

. Figure 2 shows the deformation on the cantilever for a random set of lineload combinations within the parametric space.

In order to build the snapshot matrix for the training, we perform the ROM simulations of 5000 different cases with parameters defined by a 2D Halton pseudo-random number generator. From each of these simulations, we store both the full snapshot

u_{j}^{*} \in R^{N}

and the corresponding residual

R (u_{j}^{*}; μ) \in R^{N}

. For the validation set we generate 1250 different samples using the same strategy, and we generate 300 more for the test set. We perform all evaluations over the test dataset; the training and the validation ones being used exclusively during the training process.

Φ

,

\bar{Φ}

,

Ξ

and

\bar{Ξ}

are obtained from the SVD of the snapshots dataset for training. Figure 3 shows the decay of the singular values’ energy for the first 200 modes of the SVD. The number of modes in each projection matrix is chosen by taking into account the accuracy obtained via traditional POD-based ROM using the same amount of modes. We establish two limit cases: one in which we select a value of

n = 6

, that would lead to a

\sim 0.1 %

error on the displacements snapshot using POD, and one with

n = 20

, which would achieve a relative error of

\sim 10^{- 6}

. Then, we take a series of n values in between these to compare in our experiments. The secondary ROM basis contains a variable amount of modes so that

n + \bar{n} = 60

always.

Two different error metrics are defined:

Relative error on snapshot:

$e_{u} = exp (\frac{1}{M} \sum_{j = 1}^{M} ln \frac{∥u_{j} - u_{j}^{*}∥}{∥u_{j}^{*}∥})$

(46)

This is the geometric mean of the relative errors of each ROM sample $u_{j}$ compared to the FOM one $u_{j}^{*}$ .
Relative error on residual:

$e_{R} = exp (\frac{1}{M} \sum_{j = 1}^{M} ln \frac{∥R (u_{j}; μ) - R (u_{j}^{*}; μ)∥}{∥R (u_{j}^{*}; μ)∥})$

(47)

Again, the geometric mean of the samples’ relative errors, but this time comparing the residual.

We will use these metrics in the next section, in order to compare the behavior of our models both in reconstruction and online ROM.

In terms of software, we use KratosMultiphysics [29] as our FEM framework, as it is open-source and lets us implement the required methods and interfaces in Python. For the neural network framework we choose TensorFlow.

All neural networks presented contain two hidden layers of 200 neurons each and with no bias. We use Exponential Linear Unit (ELU) as our activation, and use a learning rate scheduling strategy where the initial learning rate is decreased following a sinusoidal curve until a value of

10^{- 6}

. We select AdamW as the network optimizer, with the default values from TensorFlow. Finally, all trainings are done with batches of size

m = 16

.

7. Results

In this section we will go over the evaluation of the methods we have proposed throughout the paper. The first subsection will study the modifications to the original PROM-ANN [1] that were explained in Section 4. After that, we study the effect of training the architecture with the novel residual loss, as developed within Section 5. Finally we do a performance assessment.

7.1. Modifications on the Original PROM-ANN

In this section, we examine the modifications introduced to the original PROM-ANN architecture during Section 4. The modification involved first the incorporation of scaling matrices

Ξ

and

\bar{Ξ}

within the encoder-decoder pair and within the loss. Then we added a global, scalar scaling

1 / e_{POD, d}

to the data-based loss. The main effects of these should be (1) achieving a similar range of values for all the neural network’s inputs and (2) to make the loss and its gradients invariant to the choice of size for the latent space.

In order to confirm effect (1), we apply the encoder

E_{u}

on all snapshots from our training set, with fixed latent size

n = 60

. We do so with the encoder as defined in the original POD-ANN paper [1] (specified in Eq. (16)) and also with ours (specified in Eq (33)). Thus, obtaining

q_{orig, j} = Φ^{T} u_{j}^{*} \in R^{60}

and

q_{j} = Ξ^{- 1} Φ^{T} u_{j}^{*} \in R^{60}

, respectively, for all the snapshots. Then, in Figure 4 we plot the statistics of the obtained values for each component or mode in

q_{orig, j}

and

q_{j}

, side by side. For the original implementation (Figure 4a) only the first few modes have similarly large coefficients, then they become too small to differentiate. In contrast, by incorporating the singular values matrix, we obtain the result in Figure 4b, where all ranges are similarly large. These are, in fact, the range of values that would go into the neural network in each architecture.

Moving on to effect (2), we do a similar study but on the range of the mean of the gradients

\frac{\partial N (q_{j}; Θ)}{\partial Θ}

obtained during back-propagation in the training routine. We compute the mean gradient for each batch, and then take the mean for all batches in the training epoch. Thus, collecting a single mean gradient value per epoch. We compare the trainings with and without scaling factors

1 / e_{POD, d}

and

1 / e_{POD, R}

(otherwise as

L

in Eq. (44)). And for each of these two cases, we run four different training routines:

S6: 6 primary modes ( $n = 6$ ). Trained only on the data-based loss ( $ω_{d} = 1$ , $ω_{R} = 0$ ).
S20: 20 primary modes ( $n = 20$ ). Trained only on the data-based loss ( $ω_{d} = 1$ , $ω_{R} = 0$ ).
R6: 6 primary modes ( $n = 6$ ). Trained only on the residual loss ( $ω_{d} = 0$ , $ω_{R} = 1$ ).
R20: 20 primary modes ( $n = 20$ ). Trained only on the residual loss ( $ω_{d} = 0$ , $ω_{R} = 1$ ).

all of them for 800 epochs. We can see in Figure 5 the comparison of the statistics of these gradients for the different training modalities without the scalings (Figure 5a) and with the scalings (Figure 5b). While the former presents orders of magnitude in difference for these values depending on the size of the latent space, the latter one provides very uniform results.

The final step in the evaluation is to examine how do these modifications altogether affect the resulting decoders after training. That is, the capacity for the trained decoders to reconstruct the original snapshots, and also to serve as the approximation manifolds within ROM simulations. For this we compare three architectures and training losses:

s-loss: Using the encoder-decoder pair introduced by us in Eq. (33). Train via the data-based loss $L_{d}$ as defined in 38 for 800 epochs. Learning rate starting at 1e-3.
q-loss: Using the original PROM-ANN encoder-decoder pair, as in Eq. (16). Train via the original loss $L_{PROM - ANN}$ as defined in Eq. (19) for 800 epochs. Learning rate starting at 1e-3.
POD: Using the fully linear POD encoder-decoder pair. Involves no training apart from the SVD computation.

We compare the relative snapshot error (

e_{u}

from Eq. (46)) applied on the reconstructed snapshots

u_{j} = D_{u} (E_{u} (u_{j}^{*}))

for the whole test dataset. We do so for the three models specified above, and for all the

q_{j}

sizes specified in Section 6. Then we do the same but taking

u_{j}

as the result of the online ROM simulation with each architecture. The results are illustrated in Figure 6.

For the data reconstruction, the error for our proposed methodology, in s-loss, is several orders of magnitude lower than both POD and q-loss. While the error for q-loss is marginally better than POD only for the lowest sizes of

q

. We attribute this to the bad conditioning of the NN input, at least for problems with fast-decaying singular values like our use-case. Without proper training, the secondary term of the PROM-ANN decoder will just add noise to the final snapshot, making it behave even worse than traditional POD.

These observations translate to a similar behavior in the ROM simulations. All the ROM simulations lose some accuracy compared to the data reconstruction, but this gap is even larger for the q-loss case.

7.2. Comparison of Snapshot and Residual Losses

The next step is to make a comparison of the effect of training based purely on data versus training on residuals. Like the subsection before, we compare models with

q

size ranging from 6 to 20.

Figure 7 compares errors on snapshots (

e_{u}

from Eq. (46)). On the one hand, we check for the encoder-decoder data reconstruction and, on the other hand, for the results of the intrusive ROM execution. Meanwhile, Figure 8 compares the errors on residuals (

e_{R}

from Eq. (47)) only for the encoder-decoder reconstruction.

In both figures we compare two models trained differently:

s-loss: Train using only the snapshot loss ( $L$ from Eq. (44) with $ω_{d} = 1$ , $ω_{R} = 0$ ) for 800 epochs. Learning rate starting at 1e-3. (Same as in the previous sub-section)
r-loss: Start weights on the ones resulting from s-loss. Then train for 800 epochs with only the residual loss ( $L$ from Eq. (44) with $ω_{d} = 0$ , $ω_{R} = 1$ ). Learning rate starting at 1e-4.

The behavior represented in Figure 7 is quite interesting. Although the final result of the ROM simulation using the r-loss model is not substantially better than its s-loss counterpart, it consistently achieves a higher accuracy. This is in total opposition to the results for the encoder-decoder reconstruction of the snapshots, in which s-loss gets the highest accuracy for all

q

sizes. This seemingly contradictory phenomenon is clarified when looking at the reconstruction residuals in Figure 8 which shows the r-loss performing consistently better. These results hint that the behavior of the intrusive ROM may be more influenced by the correctness of the residual representation for the snapshots in our ROM approximation manifold than by the accuracy of the snapshots per se.

7.3. Comparison of Training and Online ROM Runtimes

For the training time, we check the mean time spent to train a single batch, measured over an entire epoch. As stted in Section 6, all cases share the same architecture (two hidden layer of size 200) and train with batches of size 16. Computation is done on a single thread of an AMD Ryzen 7 5800x3d processor.

The description for the cases to compare is the following:

Snapshot: Using loss $L$ from Eq. (44) with $ω_{d} = 1$ , $ω_{R} = 0$ . No interaction with FEM software.
Residual (optimised): $L$ from Eq. (44) with $ω_{d} = 0$ , $ω_{R} = 1$ . Implementing the gradients computation using the optimizations detailed in Eq. (45).
Residual (non-optimised): $L$ from Eq. (44) with $ω_{d} = 0$ , $ω_{R} = 1$ . Implementing the gradients computation in a naive way by computing first $A_{i} = {(R (u_{i}) - R (u_{i}^{*}))}^{T} \frac{\partial R}{\partial u}$ in KratosMultiphysics and then computing $B_{i} = \frac{\partial u_{i}}{\partial Θ}$ via the tf.GradientTape.batch_jacobian() method in TensorFlow, then performing matrix multiplication in:

$\begin{matrix} \frac{\partial L_{R}}{\partial Θ} = \frac{2}{e_{b a s e, R}} \frac{1}{N m} \sum_{i = 1}^{N} A_{i} B_{i} \end{matrix}$

(48)

The results are printed in Table 1. We see that training via the residual with the naive implementation is not feasible, as the time for a single batch is in the order of seconds. Thanks to the optimizations described in Section 3.2 we avoid computing entire Jacobian matrices, which reduces the time to the order of 10 milliseconds. Still, as we expected, training the neural network via the residual loss takes considerably longer than training via the snapshot loss (around 40x longer). That is why in Section 7.2 we perform a finetuning on the residual loss, instead of training from random weights.

Next, we want to make a performance assessment for the online ROM execution. For that, we measure the time that the FEM software takes to solve the reduced linear system of equations. Specifically, we take the mean resolution time for all the non-linear iterations while solving 9 different cases from our test set. This is performed for two different configurations of the architecture proposed in this paper, two cases of traditional POD that achieve similar accuracies to the former pair, and the FOM simulation. Again, these evaluations have been performed using a single thread from an AMD Ryzen 7 5800x3d processor.

The results for different models can be seen in Table 2.

As expected, the main computational advantage of our method (and of the original PROM-ANN [1], as both architectures are equivalent in terms of computational complexity) comes from the achieving smaller linear systems to solve for the same precision. It is noteworthy to point out that the biggest gains in computation time for the online process come with a further step in the ROM workflow called Hyper-Reduction, in which the amount of elements to take into account for the generation of the system at each step is greatly reduced. The number of reduced elements to use is directly related to the latent space dimension. Therefore, the reduction in modes that we achieve is also beneficial in this regard. This study focuses on the offline part of the ROM process, so there is no further development on the online procedure compared to the original PROM-ANN method. The implementation of Hyper-Reduction is described for this type of architecture in their paper [1], so interested readers are encouraged to check their performance assessments, which should be applicable to our case aswell.

8. Discussion and Future Work

Once we have assessed the performance of our proposed architecture and losses, we will further discuss the implications of these results and the impact of our contributions.

We comment first on the discrete FEM residual-based loss that we developed throughout Section 3, and its proposed implementation strategy. These developments pose a step-up from recent initiatives to design discrete PINN-like losses [24,25,34,35] that limit themselves to linear problems of different natures. By taking advantage of open-source FEM software like KratosMultiphysics [29], we can access all the FEM methodology needed to obtain residuals and Jacobians for non-linear cases in a wide range of state-of-the-art FEM formulations. The cost for this is a loss in time efficiency during training, compared to the classic data-based approach, because of the required dynamic interaction between the FEM software and the neural network framework during training. And also because of the manual computation of the gradient loss. It is in this sense that our implementation proposal makes a huge difference, taking the training time from being prohibitive to being just an order of magnitude higher than the data-based one. These comparisons are shown in Section 7.3. We believe that the main computational bottleneck after our proposed methodology is the fact that the FEM software currently runs the computations in series for each sample in the batch. Some future work on parallelizing these procedures could potentially unlock training times much closer to the data-based ones. Another key characteristic of the proposed loss is being parameter-agnostic. This makes it more versatile in various ways: in terms of efficiency the FEM software does not need to re-configure the simulation for each specific sample, and in terms of use-case it can be applied to cases in which not only the minimization of the residual itself is important, but also its behavior while being non-zero. This latter aspect is key in using this loss for our particular setting of intrusive ROM. One unexplored advantage of this loss formulation would be the possibility to perform partial physics-based learning, where only specific components of the total residual are used for the training. For example one could train only on the steady-state component of the residual of a dynamic case in order to avoid the inconveniences from the dependency on previous time-steps. This is not addressed in this paper, but left as possible future contributions. There are further options that the residual loss could open up and that we haven’t fully explored, e.g. the possibility of training on the residual with noisy data in order to achieve data augmentation.

Next, we comment on the modification to the PROM-ANN architecture itself and the data-based loss

L_{d}

with the scaling matrices

Ξ

,

\bar{Ξ}

and the global scaling

1 / e_{POD, d}

. The scaling matrices are an inexpensive way to normalize the input ranges for the neural network with the direct results from the SVD, so we avoid extra statistical studies of the dataset. The global scaling is a single scalar computed inexpensively via POD, only once for the whole training. The effect of these modifications is apparent in the results in Section 7.1, where both the reconstruction and ROM results from our architecture are several orders of magnitude better than the simpler, original approach described in [1]. Now, there is a very plausible explanation for the lack of performance of the original PROM-ANN, mostly when comparing with the good results that they obtain in their paper. The use case that they use for evaluation is a 2D inviscid Burgers problem, which has a much flatter decay in the SVD’s singular values compared to ours. Thus, the range of their inputs to the neural network should naturally be more uniform. Another possibility is that they apply some normalization routine prior to the neural network without mentioning it explicitly. In any case, we can say that our modification makes the architecture generalizable to any kind of problem in terms of their singular value decay. Additionally, the global scaling

1 / e_{POD, d}

is key in order to stabilize the scale of the loss and the backpropagation gradients, making the neural network optimizer perform equally whatever the choice of latent size. It also has an interpretational purpose, which is to make the loss a direct indicator of how much better the results are relative to the simple POD version. Something to explore in the future would be better methodologies to choose the modes included in the primary and secondary ROBs. Right now this is done in a greedy way, selecting as many modes are needed to achieve a certain accuracy in POD, but it is not clear how different modes are related to one another (especially since they are uncorrelated in the linear sense). The implementation of our version of ANN-PROM (only with the data-based loss) is readily available within the ROM Application of KratosMultiphysics [29].

Finally, we enter the discussion about the effect of training our modified ANN-PROM architecture on the residual loss. Other than the increased duration of the training routine, it was also difficult for us to prevent the model from falling in non-optimal local minima during training with the residual loss. That is why the chosen approach was to train first with the data-based approach and only then finetune with a purely physics-based loss. The intuition for proposing this physics-based training is that the intrusive ROM accuracy is not only given by the snapshot-reconstruction capabilities of the approximation manifold (which would work more like a lower limit in terms of error), but should also be dependant on how well the residuals are represented within these solutions in the manifold. Essentially because the residual is the quantity being optimized during the intrusive ROM simulation. The general discrepancy between ROM and reconstruction is clearly demonstrated within our use-case in Section 7.1, with the error in ROM simulation being approximately one order of magnitude higher than the reconstruction one for both our proposed architecture with the data-based loss and traditional POD. Further on we look at the results in Section 7.2 to specifically understand the effect of the residual-based training. The observations are encouraging, even if not spectacular. We say this in the sense that performing ROM with the model trained on the residual provided slightly but consistently better results than the one trained with the snapshot. It can also be interpreted as achieving a slightly lower discrepancy between reconstruction and ROM simulation, which is what we were aiming for. But the most important insight is that this phenomenon coincides with a consistently lower error in the representation of the residual by the models trained on physics. We are aware that, as it is right now, the significantly higher training time with the residual loss renders the proposed method not attractive, taking into account the marginal increase in ROM accuracy, but we are optimistic that future research can enable more meaningful improvements.

We make the observation, in retrospective, that choosing the ANN-PROM architecture for implementing the residual loss limited the achievable accuracy. That is because, even if the neural network allows us to introduce the loss of our choosing, we still depend entirely on the modes that we gathered via SVD on our snapshots dataset, without any regard for the residual accuracy. The fact that even with this caveat we were able to achieve slightly better results makes us very optimistic for the future where we could explore new architectures (or variants of this one) that put the residual in the focus point from the beginning, or that allow more freedom to correct the residual on top of the modes for the snapshot.

All in all, the three main contributions of the paper work together to complement each other and obtain a physics-informed intrusive ROM framework, but also hold value by themselves: the residual loss could be applied for other purposes like non-intrusive ROM, the modifications to the ANN-PROM architecture make it more versatile in terms of the types of problems it can handle, and the study of the effect of the residual in intrusive ROM provides a seed for a new path of research in this discipline for us and for the rest of the community in numerical methods.

9. Conclusions

In this paper, we extend the PROM-ANN architecture proposed in [1] by incorporating a training approach based on the finite element method (FEM) residual, rather than relying solely on snapshot data. This establishes a connection between non-linear reduced-order models (ROMs) and physics-informed neural networks (PINNs). While traditional PINNs use analytical partial differential equations (PDEs) to train continuous, non-intrusive models, our approach leverages discrete FEM residuals as the loss function for backpropagation, guiding the learning of the ROM approximation manifold. This development allows us to investigate the impact of improving the residual of the snapshots on the overall performance of projection based ROMs.

The path to achieve this final goal enables us to present three independently significant contributions: (1) a loss based on the FEM residual and which is parameter-agnostic and, most importantly, applicable to non-linear problems; (2) A modification on the original PROM-ANN architecture in [1] that makes it applicable to cases with fast-decaying singular values, and (3) a study on the effect of the residual-based training on ROM simulation. We demonstrate our approach in the context of static structural mechanics with non-linear hyperelasticity, specifically in the deformation of a rubber cantilever subjected to two orthogonal variable loads.

In terms of the residual loss, the fact that it is based on existing FEM software makes it applicable to a high range of problems. The proposed implementation strategy makes the interaction between the neural network and the FEM software viable in reasonable training time, around 40ms per batch. Finally, the enhancement of the resulting residuals via this loss is demonstrated by applying it on our proposed PROM-ANN-based architecture and performing data reconstruction. This results in a consistently lower residual representation compared to the cases trained on snapshot data alone.

About the modifications to the original ANN-PROM architecture, we observe how our method lowers the snapshot error by several orders of magnitude compared to POD, in both the data reconstruction and the ROM simulation results. This improvement is consistent for latent space sizes ranging from

n = 6

to

n = 20

. In contrast, the original PROM-ANN formulation struggles to train the neural network, resulting in errors higher than POD in most cases. This enhancement from our methodology comes from a proper scaling strategy on both the architecture itself and the loss to be used.

Finally, for the effect of applying the FEM residual loss on the approximation manifold for projection-based ROM, our results show a modest but consistent reduction of the gap between the accuracy for snapshot reconstruction and the accuracy for projection-based ROM. While not being useful in practice as of now, this observation makes us optimistic that better results for projection-based ROM could be unlocked in the future by taking care of the residual representations within the approximation manifold. In the future, alternative non-linear ROM architectures that enable more control over the resulting solutions’ residuals can greatly improve the results presented in this work.

Acknowledgments

N. Sibuet acknowledges the Secretariat of Universities and Research of the Department of Research and Universities of the Generalitat of Catalonia, as well as the European Social Plus Fund for their financial support through the predoctoral scholarship AGAUR-FI (2024 FI-3 00065) Joan Oró. S. Ares de Parga and J.R. Bravo acknowledge the Departament de Recerca i Universitats de la Generalitat de Catalunya for the financial support through the FI-SDUR 2020 and FI-SDUR 2021 scholarships. S. Ares de Parga also acknowledges support from the Fulbright Commission Spain through a Fulbright Predoctoral Research Fellowship (2024–2025).

Nomenclature

HFM	high-fidelity model
ROM	reduced order model
POD	proper orthogonal decomposition
SVD	singular value decomposition
PINN	Physics Informed Neural Network
PDE	partial differential equation
FEM	finite elements method
$N : (a; Θ) \mapsto b$	a neural Network parametrized by $Θ$
$u$	nodal solution of FEM problem
$q$	FEM solution representation in a given reduced space
$D : q \mapsto u$	a decoder function
$E : u \mapsto q$	an encoder function
$μ$	parameters vector for a FEM simulation
$R (u; μ)$	FEM residual, given nodal solution $u$ and simulation parameters $μ$
$L (Θ)$	loss function used to optimize parameters set $Θ$
$N \in N$	number of degrees of freedom in the FOM
$n \in N$	number of degrees of freedom in ROM. Number of POD modes. Latent space dimensions
$M \in N$	number of samples in the training dataset
$m \in N$	number of samples in a given batch when training a neural network

References

Barnett, J.; Farhat, C.; Maday, Y. Neural-network-augmented projection-based model order reduction for mitigating the Kolmogorov barrier to reducibility. Journal of Computational Physics 2023, 492, 112420. [CrossRef]
Brunton, S.L.; Nathan Kutz, J.; Manohar, K.; Aravkin, A.Y.; Morgansen, K.; Klemisch, J.; Goebel, N.; Buttrick, J.; Poskin, J.; Blom-Schieber, A.W.; et al. Data-driven aerospace engineering: reframing the industry with machine learning. AIAA Journal 2021, 59, 2820–2847.
Jiang, Y.; Yin, S.; Li, K.; Luo, H.; Kaynak, O. Industrial applications of digital twins. Philosophical Transactions of the Royal Society A 2021, 379, 20200360.
Hesthaven, J.S.; Rozza, G.; Stamm, B.; et al. Certified reduced basis methods for parametrized partial differential equations; SpringerBriefs in Mathematics, Springer Cham, 2016.
Sirovich, L. Turbulence and the dynamics of coherent structures. Part I: Coherent structures. Quarterly of Applied Mathematics 1987, 45, 561–571.
Holmes, P.; Lumley, J.L.; Berkooz, G. Turbulence, Coherent Structures, Dynamical Systems and Symmetry; Cambridge Monographs on Mechanics, Cambridge University Press, 1996.
Eckart, C.; Young, G. The approximation of one matrix by another of lower rank. Psychometrika 1936, 1, 211–218.
Quarteroni, A.; Rozza, G.; et al. Reduced order methods for modeling and computational reduction; Vol. 9, Modeling, Simulation and Applications, Springer Cham, 2014. [CrossRef]
Carlberg, K.; Bou-Mosleh, C.; Farhat, C. Efficient non-linear model reduction via a least-squares Petrov-Galerkin projection and compressive tensor approximations. International Journal for Numerical Methods in Engineering 2011, 86. [CrossRef]
Carlberg, K.; Farhat, C.; Cortial, J.; Amsallem, D. The GNAT method for nonlinear model reduction: Effective implementation and application to computational fluid dynamics and turbulent flows. Journal of Computational Physics 2013, 242. [CrossRef]
Carlberg, K.; Barone, M.; Antil, H. Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction. Journal of Computational Physics 2017, 330. [CrossRef]
Grimberg, S.; Farhat, C.; Tezaur, R.; Bou-Mosleh, C. Mesh sampling and weighting for the hyperreduction of nonlinear Petrov–Galerkin reduced-order models with local reduced-order bases. International Journal for Numerical Methods in Engineering 2021, 122, 1846–1874.
de Parga, S.A.; Bravo, J.R.; Hernández, J.A.; Zorrilla, R.; Rossi, R. Hyper-reduction for Petrov–Galerkin reduced order models. Computer Methods in Applied Mechanics and Engineering 2023, 416. [CrossRef]
Pinkus, A. n-Widths in Approximation Theory; Springer Berlin, Heidelberg. [CrossRef]
Lee, K.; Carlberg, K.T. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics 2020, 404, 108973.
Barnett, J.; Farhat, C. Quadratic approximation manifold for mitigating the Kolmogorov barrier in nonlinear projection-based model order reduction. Journal of Computational Physics 2022, 464, 111348.
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 2017.
Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing 2022, 92, 88.
Li, Z.; Zheng, H.; Kovachki, N.; Jin, D.; Chen, H.; Liu, B.; Azizzadenesheli, K.; Anandkumar, A. Physics-Informed Neural Operator for Learning Partial Differential Equations. ACM / IMS J. Data Sci. 2024, 1. [CrossRef]
Wang, S.; Yu, X.; Perdikaris, P. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics 2022, 449, 110768. [CrossRef]
Fuks, O.; Tchelepi, H.A. LIMITATIONS OF PHYSICS INFORMED MACHINE LEARNING FOR NONLINEAR TWO-PHASE TRANSPORT IN POROUS MEDIA. Journal of Machine Learning for Modeling and Computing 2020, 1. [CrossRef]
Niu, S.; Zhang, E.; Bazilevs, Y.; Srivastava, V. Modeling finite-strain plasticity using physics-informed neural network and assessment of the network performance. Journal of the Mechanics and Physics of Solids 2023, 172, 105177. [CrossRef]
Zienkiewicz, O. The Finite Element Method in Engineering Science; McGraw-Hill European Publishing Programme, McGraw-Hill, 1971.
Meethal, R.E.; Kodakkal, A.; Khalil, M.; Ghantasala, A.; Obst, B.; Bletzinger, K.U.; Wüchner, R. Finite element method-enhanced neural network for forward and inverse problems. Advanced Modeling and Simulation in Engineering Sciences 2023, 10. [CrossRef]
Le-Duc, T.; Nguyen-Xuan, H.; Lee, J. A finite-element-informed neural network for parametric simulation in structural mechanics. Finite Elements in Analysis and Design 2023, 217. [CrossRef]
Eshaghi, M.S.; Anitescu, C.; Thombre, M.; Wang, Y.; Zhuang, X.; Rabczuk, T. Variational Physics-informed Neural Operator (VINO) for solving partial differential equations. Computer Methods in Applied Mechanics and Engineering 2025, 437, 117785. [CrossRef]
Wang, S.; Wang, H.; Perdikaris, P. Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Science Advances 2021, 7, eabi8605, [https://www.science.org/doi/pdf/10.1126/sciadv.abi8605]. [CrossRef]
Chmiel, M.R.; Barnett, J.L.; Farhat, C. Assessment of Projection-Based Model Order Reduction for a Benchmark Hypersonic Flow Problem. In Proceedings of the AIAA SciTech 2024 Forum, 2024, p. 0250.
Ferrándiz, V.M.; Bucher, P.; Zorrilla, R.; Warnakulasuriya, S.; Cornejo, A.; Rossi, R.; Roig, C.; jcotela.; Maria, J.; tteschemacher.; et al. KratosMultiphysics/Kratos: v10.1, 2024. URL: . [CrossRef]
Cai, S.; Mao, Z.; Wang, Z.; Yin, M.; Karniadakis, G.E. Physics-informed neural networks (PINNs) for fluid mechanics: A review. Acta Mechanica Sinica 2021, 37, 1727–1738.
Zhu, Y.; Zabaras, N.; Koutsourelakis, P.S.; Perdikaris, P. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 2019, 394, 56–81. [CrossRef]
Sun, L.; Gao, H.; Pan, S.; Wang, J.X. Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data. Computer Methods in Applied Mechanics and Engineering 2020, 361, 112732. [CrossRef]
Rueden, L.V.; Mayer, S.; Beckh, K.; Georgiev, B.; Giesselbach, S.; Heese, R.; Kirsch, B.; Pfrommer, J.; Pick, A.; Ramamurthy, R.; et al. Informed Machine Learning - A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems. IEEE Transactions on Knowledge and Data Engineering 2023, 35. [CrossRef]
Yamazaki, Y.; Harandi, A.; Muramatsu, M.; Viardin, A.; Apel, M.; Brepols, T.; Reese, S.; Rezaei, S. A finite element-based physics-informed operator learning framework for spatiotemporal partial differential equations on arbitrary domains. Engineering with Computers 2025, 41, 1–29. [CrossRef]
Sunil, P.; Sills, R.B. FE-PINNs: finite-element-based physics-informed neural networks for surrogate modeling. arXiv preprint arXiv:2412.07126 2024.
Montavon, G.; Orr, G.; Müller, K.R. Neural networks: Tricks of the Trade; Vol. 7700, Lecture Notes in Computer Science, Springer Berlin, Heidelberg, 2012.
Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep learning; Vol. 1, Adaptive Computation and Machine Learning, MIT Press Cambridge, 2016.

1	For our current discussion can be either a time step, or a loading step following a predefined trajectory
2	This is true for snapshots within the set $S_{u}$ , and should reflect in new samples if they are properly represented by the POD

Figure 1. Mesh of the cantilever with line loads

Figure 2. Examples of deformation in the cantilever for six different random combinations of line loads

Figure 3. Singular values’ energies for the first 200 modes of

S_{u}

.

Figure 3. Singular values’ energies for the first 200 modes of

S_{u}

.

Figure 4. Boxplots showing the spread of the values for each mode of

q

, for all the samples in our training dataset. (a) shows the case where

q = Φ^{T} u^{*}

and (b) the case where

q = Ξ^{- 1} Φ^{T} u^{*}

. The ranges are much more uniform in the latter case, which should be beneficial when training the neural network.

Figure 4. Boxplots showing the spread of the values for each mode of

q

, for all the samples in our training dataset. (a) shows the case where

q = Φ^{T} u^{*}

and (b) the case where

q = Ξ^{- 1} Φ^{T} u^{*}

. The ranges are much more uniform in the latter case, which should be beneficial when training the neural network.

Figure 5. Boxplots showing the spread of the mean value per epoch of the applied gradients during training, four different training strategies. (a) Shows the results when no rescaling of the loss is applied. (b) Shows the results when the rescaling factors

e_{POD, d}

and

e_{POD, R}

are applied. The latter case is invariant to the choice of latent space size, contrary to the first one.

Figure 5. Boxplots showing the spread of the mean value per epoch of the applied gradients during training, four different training strategies. (a) Shows the results when no rescaling of the loss is applied. (b) Shows the results when the rescaling factors

e_{POD, d}

and

e_{POD, R}

are applied. The latter case is invariant to the choice of latent space size, contrary to the first one.

Figure 6. Comparison of the error in the snapshot,

e_{u}

, over three different architectures: (1) s-loss with our modified PROM-ANN method, (2) q-loss with the original PROM-ANN method in [1]and (3) POD with the traditional POD method. All of them for different dimensions of latent space. Dotted lines represent error on data reconstruction alone, while full lines represent error on online ROM simulation. s-loss consistently performs several orders of magnitude better than POD, while q-loss is not able to keep up with POD in most cases.

Figure 6. Comparison of the error in the snapshot,

e_{u}

, over three different architectures: (1) s-loss with our modified PROM-ANN method, (2) q-loss with the original PROM-ANN method in [1]and (3) POD with the traditional POD method. All of them for different dimensions of latent space. Dotted lines represent error on data reconstruction alone, while full lines represent error on online ROM simulation. s-loss consistently performs several orders of magnitude better than POD, while q-loss is not able to keep up with POD in most cases.

Figure 7. Comparison of the error in the snapshot,

e_{u}

, for models trained on loss

L

from Eq. (44) with two different configurations: (1) s-loss with (

ω_{d} = 1

,

ω_{R} = 0

), from randomly initialized weights and (2) r-loss with (

ω_{d} = 0

,

ω_{R} = 1

) but starting from the weights of its s-loss counterpart. Both of them for different dimensions of latent space. Dotted lines represent error on data reconstruction alone, while full lines represent error on online ROM simulation. r-loss consistently gets slightly worse accuracy in the data reconstruction, while its ROM simulation solutions are slightly better than s-loss.

Figure 7. Comparison of the error in the snapshot,

e_{u}

, for models trained on loss

L

from Eq. (44) with two different configurations: (1) s-loss with (

ω_{d} = 1

,

ω_{R} = 0

), from randomly initialized weights and (2) r-loss with (

ω_{d} = 0

,

ω_{R} = 1

) but starting from the weights of its s-loss counterpart. Both of them for different dimensions of latent space. Dotted lines represent error on data reconstruction alone, while full lines represent error on online ROM simulation. r-loss consistently gets slightly worse accuracy in the data reconstruction, while its ROM simulation solutions are slightly better than s-loss.

Figure 8. Comparison of the error in the residual,

e_{R}

, for models s-loss and r-loss as described in Figure 7. Both of them for different dimensions of latent space. The error is computed only for the case of data reconstruction. r-loss consistently gets lower error than s-loss.

Figure 8. Comparison of the error in the residual,

e_{R}

, for models s-loss and r-loss as described in Figure 7. Both of them for different dimensions of latent space. The error is computed only for the case of data reconstruction. r-loss consistently gets lower error than s-loss.

Table 1. Comparison of training time for a single batch, among different losses.

Loss type	Mean batch training time (s)
Snapshot	9.20e-4
Residual (optimised)	3.87e-2
Residual (non-optimised)	7.41

Table 2. Comparison of computation times for various tasks within the online ROM routine, over a set of different models.

Model	q size	$e_{u}$	System solve time (s)
Modified PROM-ANN (Eq. (33))	6	7.67e-5	1.42e-6
Modified PROM-ANN (Eq. (33))	20	6.12e-8	5.55e-6
POD	18	2.39e-5	4.84-6
POD	40	1.32e-7	1.99e-5
FOM	-	-	2.11e-3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Discrete Physics-Informed Training for Projection-Based Reduced Order Models with Neural Networks

Abstract

Keywords:

Subject:

1. Introduction

2. Background

2.1. Physics-Informed Neural Networks

2.2. Projection-Based Reduced Order Models

2.2.1. The Full Order Model (FOM)

2.3. Manifold Projection-Based ROMs

2.4. Neural Network-Augmented Projection-Based ROM (PROM-ANN)

2.4.1. Construction of the Nonlinear Approximation Manifold

2.4.2. Training the Neural Network

3. Discrete PINN-like Loss

3.1. Parameter-Agnostic Loss

3.2. In-Training Integration of FEM Software

3.3. Data-Based Loss Term

4. Modifications to the PROM-ANN Architecture

4.1. Scaling of Reducedcoefficients

4.2. Corrected Data-Based Loss

4.3. Scaling of the Data-Based Loss

4.4. Online Phase of Non-Linear ROM

5. Integration of Physics-Based Loss into PROM-ANN

6. Use-Case and Evaluation Methodology

7. Results

7.1. Modifications on the Original PROM-ANN

7.2. Comparison of Snapshot and Residual Losses

7.3. Comparison of Training and Online ROM Runtimes

8. Discussion and Future Work

9. Conclusions

Acknowledgments

Nomenclature

References

MDPI Initiatives

Important Links

Subscribe