Introducing the Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations. I: Mathematical Framework

Preprint

Article

Introducing the Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations. I: Mathematical Framework

This version is not peer-reviewed.

Dan Gabriel Cacuci^*

This version is not peer-reviewed.

Altmetrics

Downloads

Views

Comments

Submitted:

14 October 2024

Posted:

15 October 2024

You are already at the latest version

A peer-reviewed article of this preprint also exists.

Abstract

This work introduces the mathematical framework of the novel “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations” (1st-FASAM-NODE). The 1st-FASAM-NODE methodology produces and computes most efficiently the exact expressions of all of the first-order sensitivities of NODE-decoder responses with respect to the parameters underlying the NODE’s decoder, hidden layers, and encoder, after having optimized the NODE-net to represent the physical system under consideration. Building on the 1st-FASAM-NODE, this work subsequently introduces the mathematical framework of the novel “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2nd-FASAM-NODE).” The 2nd-FASAM-NODE methodology yields and computes most efficiently the exact expressions of the second-order sensitivities of NODE decoder-responses with respect to the NODE parameters. Since the physical system modeled by the NODE-net necessarily comprises imprecisely known parameters that stem from measurements and/or computations subject to uncertainties, the availability of the first-and second-order sensitivities of decoder responses to the parameters underlying the NODE-net are essential for performing sensitivity analysis and quantifying the uncertainties induced in the NODE-decoder responses by uncertainties in the underlying uncertain NODE-parameters.

Keywords:

Subject:

Engineering - Safety, Risk, Reliability and Quality

1. Introduction

Neural Ordinary Differential Equations (NODE) provide a bridge between modern deep learning and classical mathematical/numerical modelling, while providing an explicit connection between deep feed-forward neural networks and dynamical systems. Although concepts of dynamical systems theory had been used to improve neural network performance [see, e.g., [1,2]], NODE-nets appear to have been formally introduced by Chen et al. [3]. NODE provide a flexible trade-off between efficiency, memory costs and accuracy. The approximation capabilities [4,5] of NODE are particularly useful for modeling and controlling physical environments [see, e.g., [6]], generative models for continuous normalizing flows [3,7], and time-series modelling [3,8,9].

Neural ODEs are trained by minimizing a least-squares quadratic scalar-valued “loss function” which is usually meant to represent the discrepancy between a “reference solution” and the output produced by the NODE-decoder. The minimization procedure requires the computation of the gradients of the loss function with respect to the weights to be optimized. These gradients can be computed by using a first-order optimizer such as “stochastic gradient descent” [10,11], employing either the so-called “direct method” or the “adjoint method” [12,13,14]. The one-dimensional definite integrals, which appear when computing gradients via the “adjoint method” can be efficiently evaluated by using Gauss–Legendre quadrature, which has been shown [15] to be faster than ODE-based methods while retaining memory efficiency.

In the literature on neural nets, the “gradients of the loss function” are often called “sensitivities” and various aspects of the optimization/training procedure are occasionally called “sensitivity analysis.” However, this optimization procedure is not a bona-fide “sensitivity analysis,” since the “loss function” being minimized is of interest only for the “training” phase of the NODE-net, and the “sensitivities of the loss function” are driven towards the ideal zero-values by the minimization process while optimizing the NODE weights/parameters.

After the NODE-net is optimized to reproduce the underlying physical system as closely as possible, the subsequent responses of interest become various functionals of the NODE’s “decoder” output rather than some “loss function.” The physical system modeled by the NODE-net comprises parameters that stem from measurements and/or computations. Such parameters are not perfectly well known but are subject to uncertainties that stem from the respective experiments and/or computations. Hence, it is important to quantify the uncertainties induced in the NODE decoder output by the uncertainties that afflict the parameters/weights underlying the physical system modeled by the NODE-net. The quantification of the uncertainties in the NODE decoder and derived results (i.e., “NODE responses”) of interest require the computation of the sensitivities of the NODE decoder with respect to the optimized NODE weights/parameters. Cacuci [16] has recently presented the “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-CASAM-NODE)”, which is a pioneering sensitivity analysis methodology for computing all of the first-order sensitivities, exactly and exhaustively, of the responses of the post-training optimized NODE decoder with respect to the optimized/trained weights involved in the NODE’s decoder, hidden layers, and encoder.

It is well known that equations modeling real-life phenomena comprise not only scalar-valued model parameters but also functions of such scalar model parameters. For example, conservation equations modeling thermal-hydraulics phenomena depend on correlations, Reynolds, Nusselt numbers, etc., which are functions of various primary scalar parameters. Similarly, the neutron transport and/or diffusion equations in nuclear reactor engineering (e.g., reactor physics and shielding) involve group-averaged macroscopic cross section, which are scalar-valued functions of various sums of scalar-valued isotopic number densities and microscopic cross sections. It is convenient to refer to such scalar-valued functions as “features of primary model parameters.” Cacuci [17] has recently introduced the “n^th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (n^th-FASAM-N),” which enables the most efficient computation of the exact expressions of arbitrarily high-order sensitivities of model responses with respect to the model’s “features.” Subsequently, the sensitivities of the responses with respect to the primary model parameters are determined, analytically and trivially, by applying the “chain-rule” to the expressions of the sensitivities with respect to the model’s “features/functions of parameters.”

The material presented in this work extends the 1^st-CASAM-NODE [16] methodology by using the concepts underlying the n^th-FASAM-N methodology [17] to introduce the newly developed “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-FASAM-NODE)” and “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2^nd-FASAM-NODE)”. The 1^st-FASAM-NODE methodology is introduced in Section 2 while the 2^nd-FASAM-NODE methodology is introduced in Section 3. The discussion presented in Section 4 concludes this work, noting that the 1^st-FASAM-NODE methodology enables the computation of exactly-determined first-order sensitivities of decoder response with respect to the NODE-parameters with unparalleled efficiency, while the 2^nd-FASAM-NODE methodology enables the computation of exactly-determined second-order sensitivities of decoder response with respect to the NODE-parameters with unparalleled efficiency. The efficiency of the 1^st-FASAM-NODE and the 2^nd-FASAM-NODE methodologies will be illustrated in the accompanying “Part II” [18] by means of heat and energy transfer responses in the Nordheim-Fuchs phenomenological model for reactor safety [19,20].

2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-FASAM-NODE)

The general representation of a NODE-network that comprises “features of primary model parameters/weights” is provided by the following system of “augmented” equations:

\frac{d h (t)}{d t} = f [h (t); F (θ); t], t > 0,

(1)

h (t_{0}) = h_{e} (x, w), a t t = t_{0},

(2)

r (t_{f}) = h_{d} [h (t_{f}); φ], a t t = t_{f},

(3)

where:

(i): The quantity $t$ is a time-like independent variable which parameterizes the dynamics of the hidden/latent neuron units; the initial value is denoted as $t_{0}$ (which can be considered to be an initial measurement time) while the stopping value is denoted as $t_{f}$ (which can be considered to be the next measurement time).
(ii): The $T H$ -dimensional vector-valued function $h (t) ≜ {[h_{1} (t), ..., h_{T H} (t)]}^{†}$ represents the hidden/latent neural networks. In this work, all vectors are considered to be column vectors and the dagger “ $†$ ” symbol will be used to denote “transposition.” The symbol “ $≜$ ” will be used to denote “is defined as” or, equivalently, “is by definition equal to.”
(iii): The $T H$ -dimensional vector-valued nonlinear function $f [h (t); F (θ); t] ≜ {[f_{1} (h; F (θ); t), ..., f_{T H} (h; F (θ); t)]}^{†}$ models the dynamics of the latent neurons. The components of the vector $θ ≜ {[θ_{1}, ..., θ_{T W}]}^{†}$ represents learnable scalar adjustable weights, where $T W$ denotes the total number of adjustable weights in all of the latent neural nets. The components of the vector-valued function $F (θ) ≜ {[F_{1} (θ), ..., F_{T F} (θ)]}^{†}$ represent the ”feature” functions of the respective weights, which are considered to be “primary parameters;” the quantity $T F$ denotes the “total number of feature/functions of the primary model parameters” comprised in the NODE. Evidently, the total number of feature functions must necessarily be smaller than the total number of primary parameters (weights), i.e., $T F < T W$ . In the extreme case when there are no feature functions, it follows that $F_{i} (θ) \equiv θ_{i}$ , for all $i = 1, ..., T W \equiv T F$ .
(iv): The $T H$ -dimensional vector-valued function $h_{e} (x, w) ≜ {\{h_{1}^{e} (x, w), ..., h_{T H}^{e} (x, w)\}}^{†}$ represents the “encoder” which is characterized by “inputs” $x ≜ {[x_{1}, ..., x_{T I}]}^{†}$ and “learnable” scalar adjustable weights $w ≜ {[w_{1}, ..., w_{T E W}]}^{†}$ , where $T I$ denotes the total number of “inputs” and $T E W$ denotes the total number of “learnable encoder weights” that define the “encoder.”
(v): The $T R$ -dimensional vector-valued function $r (t_{f}) ≜ {\{r_{1} [h (t_{f}); φ], ..., r_{T R} [h (t_{f}); φ]\}}^{†} = h_{d} [h (t_{f}); φ]$ represents the vector of “system responses.” The vector-valued function $h_{d} [h (t_{f}); φ] ≜ {\{h_{1}^{d} [h (t_{f}); φ], ..., h_{T R}^{d} [h (t_{f}); φ]\}}^{†}$ represents the “decoder” with learnable scalar adjustable weights, which are represented by the components of the vector $φ ≜ {[φ_{1}, ..., φ_{T D}]}^{†}$ , where $T D$ denotes the total number of adjustable weights that characterize the “decoder.” Each component $r_{n} [h (t_{f}); φ]$ can be represented in integral form as follows:

$r_{n} (h; φ) = \int_{t_{0}}^{t_{f}} h_{n}^{d} [h (t); φ] δ (t - t_{f}) d t; n = 1, ..., T R .$

(4)

After the “training” of the NODE has been accomplished (using the “adjoint” or other methods), the various “weights” will have been assigned “optimal” values which will have minimized the user-chosen loss functional, which is usually meant to represent the discrepancy between a “reference solution” and the output produced by the NODE-decoder. These “optimal” values will be denoted using a superscript “zero” as follows:

θ^{0} ≜ {[θ_{1}^{0}, ..., θ_{T W}^{0}]}^{†}

and

w^{0} ≜ {[w_{1}^{0}, ..., w_{T E W}^{0}]}^{†}

. Using these optimal/nominal parameter values to solve the NODE-system will yield the optimal/nominal solution

h^{0} (t)

, which will satisfy the following forms of Eqs. (1) and (2):

\frac{d h^{0} (t)}{d t} = f [h^{0} (t); F (θ^{0}); t], t > 0,

(5)

h^{0} (t_{0}) = h_{e} (x^{0}, w^{0}), a t t = t_{0} .

(6)

Furthermore, the optimal/nominal solution

h^{0} (t)

will be used to obtain the components

r_{n}^{0} (h^{0}; φ^{0})

of the vector of optimal/nominal decoder-responses; in view of Eq. (3), these components can be written in the following integral form:

r_{n}^{0} (h^{0}; φ^{0}) = \int_{t_{0}}^{t_{f}} h_{n}^{d} [h^{0} (t); φ^{0}] δ (t - t_{f}) d t; n = 1, ..., T R .

(7)

However, the parameters and the initial conditions underlying the actual physical system, which is represented by the optimized NODE, are not known exactly because they are actually subject to uncertainties. The known nominal values

w^{0}

of the weights characterizing the encoder will differ from the true but unknown values

w

of the respective weights by variations denoted as

δ w ≜ w - w^{0}

, while the known nominal values

x^{0}

of the initial conditions will differ from the true but unknown values

x

of the initial conditions by variations denoted as

δ x ≜ x - x^{0}

. Similarly, the nominal values

θ^{0}

and

φ^{0}

, respectively, will differ by variations

δ θ ≜ θ - θ^{0}

and

δ φ ≜ φ - φ^{0}

, respectively, from the corresponding true but unknown values

θ

and

φ

. The forward state functions

h (t)

are related to the weights and initial conditions through Eqs. (1)‒(3); consequently, variations in these weights and initial conditions will induce corresponding variations

v^{(1)} (t) ≜ {[δ h_{1} (t), \dots, δ h_{T H} (t)]}^{†}

around the nominal solution

h^{0} (t)

. In turn, the variations

δ φ

and

v^{(1)} (t)

will induce variations

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

in the system’s response.

The 1^st-FASAM-NODE methodology for computing the first-order sensitivities of the response with respect to the model’s weights and initial conditions will be established by following the same principles as those underlying the 1^st-FASAM-N [17] methodology. The fundamental concept for defining the sensitivity of an operator-valued quantity

R (e)

with respect to variations

δ e

in a neighborhood around the nominal values

e^{0}

, has been shown by Cacuci [21] to be provided by the 1^st-order Gateaux- (G-) variation

δ R (e^{0}; δ e)

R (e)

, which is defined as follows:

δ R (e^{0}; δ e) ≜ {\{\frac{d}{d ε} [R (e^{0} + ε δ e)]\}}_{ε = 0} ≜ \lim_{ε \to 0} \frac{R (e^{0} + ε δ e) - R (e^{0})}{ε},

(8)

for a scalar

ε \in F

and for all (i.e., arbitrary) vectors

δ e

in a neighborhood

(e^{0} + ε δ e)

around

e^{0}

. The G-variation

δ R (e^{0}; δ e)

is an operator defined on the same domain as

R (e)

and has the same range as

R (e)

. The G-variation

δ R (e^{0}; δ e)

satisfies the relation:

R (e^{0} + ε δ e) - R (e^{0}) = δ R (e^{0}; δ e) + Δ (δ e),

with

\lim_{ε \to 0} [Δ (ε δ e)] / ε = 0

. When the G-variation

δ R (e^{0}; δ e)

is linear in the variation

δ e

, it can be written in the form

δ R (e^{0}; δ e) = {\{\partial R / \partial e\}}_{e^{0}} δ e

, where

{\{\partial R / \partial e\}}_{e^{0}}

denotes the first-order G-derivative of

R (e)

with respect to

e

evaluated at

e^{0}

Applying the definition provided in Eq. (8) to Eq. (4) yields the following expression for the first-order G-variation

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

of the response

r_{n} (h; φ)

\begin{array}{l} δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ) = {\{\frac{d}{d ε} \int_{t_{0}}^{t_{f}} h_{n}^{d} [h^{0} (t) + ε v^{(1)} (t); φ^{0} + ε δ φ] δ (t - t_{f}) d t;\}}_{ε = 0} \\ = {\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r} + {\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d}; n = 1, ..., T R . \end{array}

(9)

where

v^{(1)} ≜ {[v_{1}^{(1)} (t), ..., v_{T H}^{(1)} (t)]}^{†} ≜ δ h (t) ≜ {[δ h_{1} (t), ..., δ h_{T H} (t)]}^{†}

and where the following definitions were used:

{\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r} ≜ {\int_{t_{0}}^{t_{f}} δ (t - t_{f}) \{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial φ}\}}_{(h^{0}; φ^{0})} δ φ d t,

(10)

{\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} ≜ {\int_{t_{0}}^{t_{f}} δ (t - t_{f}) \{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial h (t)}\}}_{(h^{0}; φ^{0})} v^{(1)} (t) d t .

(11)

The quantity

{\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r}

in Eq. (10) denotes the partial G-derivatives of the response

h_{n}^{d} [h (t); φ]

with respect to the decoder weights

φ ≜ {[φ_{1}, ..., φ_{T D}]}^{†}

, evaluated at the nominal values

(h^{0}; φ^{0})

. The quantity

{\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r}

is called the “direct-effect term” because it arises directly from parameter variations

δ φ

and can be computed directly using the nominal values

(h^{0}; φ^{0})

. The quantity

{\{δ r_{n} (h^{0}; φ^{0}; δ h; δ φ)\}}_{i n d}

is called the “indirect-effect term” because it arises indirectly, through the variations

v^{(1)} (t)

in the hidden state functions

h (t)

. The indirect-effect term can be quantified only after having determined the variations

v^{(1)} (t)

, which are caused by the variations

δ x

δ w

and

δ θ

The first-order relationships among the variations

v^{(1)} (t)

δ x

δ w

and

δ θ

are obtained from the first-order G-variations of Eqs. (1) and (2), which are obtained, by definition, as follows:

{\{\frac{d}{d ε} [\frac{d}{d t} (h^{0} + ε v^{(1)})]\}}_{ε = 0} = {\{\frac{d}{d ε} f [h^{0} + ε v^{(1)}; F (θ^{0}) + ε δ F; t]\}}_{ε = 0},

(12)

{\{\frac{d}{d ε} [h^{0} (t_{0}) + ε v^{(1)} (t_{0})]\}}_{ε = 0} = {\{\frac{d}{d ε} [h_{e} (x^{0} + ε δ x, w^{0} + ε δ w)]\}}_{ε = 0} .

(13)

Carrying out the operations indicated in Eqs. (12) and (13) yields the following system of equations:

\frac{d v^{(1)} (t)}{d t} = J (h; F) v^{(1)} (t) + Q^{(1)} (h; F) δ F,

(14)

v^{(1)} (t_{0}) = \frac{\partial h_{e} (x, w)}{\partial x} δ x + \frac{\partial h_{e} (x, w)}{\partial w} δ w,

(15)

where the various matrices and vectors are defined as follows:

\begin{array}{l} J (h; F) ≜ {(\begin{matrix} J_{11} (h; F) & \cdot & J_{1, T H} (h; F) \\ \cdot & \cdot & \cdot \\ J_{T H, 1} (h; F) & \cdot & J_{T H, T H} (h; F) \end{matrix})}_{T H \times T H} ≜ \frac{\partial f (h; F)}{\partial h}; \\ J_{i j} (h; F) ≜ \frac{\partial f_{i}}{\partial h_{j}}; ​ ​ ​ i, j = 1, ..., T H; \end{array}

(16)

\begin{array}{l} Q^{(1)} (h; F) ≜ {(\begin{matrix} q_{11}^{(1)} (h; F) & \cdot & q_{1, T F}^{(1)} (h; F) \\ \cdot & \cdot & \cdot \\ q_{T H, 1}^{(1)} (h; F) & \cdot & q_{T F, T F}^{(1)} (h; F) \end{matrix})}_{T H \times T F} ≜ \frac{\partial f (h; F)}{\partial F}; ​ ​ ​ \\ q_{i j}^{(1)} (h; F) ≜ \frac{\partial f_{i} (h; F)}{\partial F_{j}}; ​ ​ ​ i = 1, ..., T H; j = 1, ..., T F; \end{array}

(17)

\frac{\partial h_{e} (x, w)}{\partial x} ≜ {(\begin{matrix} \partial h_{1}^{e} / \partial x_{1} & \cdot & \partial h_{1}^{e} / \partial x_{T I} \\ \cdot & \cdot & \cdot \\ \partial h_{T H}^{e} / \partial x_{1} & \cdot & \partial h_{T H}^{e} / \partial x_{T I} \end{matrix})}_{T H \times T I};

(18)

\frac{\partial h_{e} (x, w)}{\partial w} ≜ {(\begin{matrix} \partial h_{1}^{e} / \partial w_{1} & \cdot & \partial h_{1}^{e} / \partial w_{T E W} \\ \cdot & \cdot & \cdot \\ \partial h_{T H}^{e} / \partial w_{1} & \cdot & \partial h_{T H}^{e} / \partial w_{T E W} \end{matrix})}_{T H \times T E W} .

(19)

The system comprising Eqs. (14) and (15) is called the “1^st-Level Variational Sensitivity System” (1^st-LVSS), and its solution,

v^{(1)} (t)

, is called the 1^st-level variational sensitivity function. The 1^st-LVSS is to be satisfied at the nominal values

(h^{0}, F^{0}, x^{0}, w^{0})

for the respective functions and parameters, but this fact had not been indicated explicitly in order to simplify the notation. Note that the 1^st-LVSS would need to be solved anew for each component of the variations

δ x

δ w

and

δ F

, which would be prohibitively expensive computationally.

The need for solving the 1^st-LVSS can be avoided if the indirect-effect term defined in Eq. (11) could be expressed in terms of a “right-hand side” that does not involve the function

v^{(1)} (t)

. This goal can be achieved by expressing the right-side of Eq. (11) in terms of the solutions of the “1^st-Level Adjoint Sensitivity System (1^st-LASS),” to be constructed next, the construction of which requires the introduction of adjoint operators. Adjoint operators can be defined in Banach spaces but are most useful in Hilbert spaces. For the NODE considered in this work, the appropriate Hilbert space will be defined on the domain

Ω_{t} ≜ [t_{0}, t_{f}]

and will be denoted as

H_{1} (Ω_{t})

, so that

v^{(1)} (t) \in H_{1} (Ω_{t})

. The inner product of two vectors

χ^{(1)} (t) ≜ {[χ_{1}^{(1)} (t), \dots, χ_{T H}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

and

η^{(1)} (t) ≜ {[η_{1}^{(1)} (t), \dots, η_{T H}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

will be denoted as

{〈χ^{(1)} (t), η^{(1)} (t)〉}_{1}

, and is defined as follows:

{〈χ^{(1)} (t), η^{(1)} (t)〉}_{1} ≜ \int_{t_{0}}^{t_{f}} {[χ^{(1)} (t)]}^{†} η^{(1)} (t) d t = \sum_{i = 1}^{T H} \int_{t_{0}}^{t_{f}} χ_{i}^{(1)} (t) η_{i}^{(1)} (t) d t .

(20)

The scalar product

{〈χ^{(1)} (t), η^{(1)} (t)〉}_{1}

is required to hold in a neighborhood of the nominal values

(x^{0}; F^{0}; w^{0}; φ^{0})

The next step is to form the inner product of Eq. (14) with a vector

a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), \dots, a_{T H}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

, where the superscript “(1)” indicates “1^st-Level”, to obtain the following relationship:

{〈a^{(1)} (t), \frac{d v^{(1)} (t)}{d t} - J (h; F) v^{(1)} (t)〉}_{1} = {〈a^{(1)} (t), Q^{(1)} (h; F) δ F〉}_{1} .

(21)

Using the definition of the adjoint operator in

H_{1} (Ω_{t})

, the left-side of Eq. (21) is transformed as follows, after integrating by parts over the independent variable

t

\begin{array}{l} \int_{t_{0}}^{t_{f}} {[a^{(1)} (t)]}^{†} \frac{d v^{(1)} (t)}{d t} d t - \int_{t_{0}}^{t_{f}} {[a^{(1)} (t)]}^{†} J (h; F) v^{(1)} (t) d t = {[a^{(1)} (t_{f})]}^{†} v^{(1)} (t_{f}) \\ - {[a^{(1)} (t_{0})]}^{†} v^{(1)} (t_{0}) + \int_{t_{0}}^{t_{f}} {[v^{(1)} (t)]}^{†} \{- \frac{d a^{(1)} (t)}{d t} - {[J (h; F)]}^{†} a^{(1)} (t)\} d t . \end{array}

(22)

The last term on the right-side of Eq. (22) is now required to represent the “indirect-effect” term defined in Eq. (11), which is achieved by requiring that the 1^st-level adjoint function

a^{(1)} (t)

satisfy the following relation written in NODE-format:

\frac{d a^{(1)} (t)}{d t} = - {[J (h; F)]}^{†} a^{(1)} (t) - \frac{\partial h_{n}^{d} [h (t); φ]}{\partial h (t)} δ (t - t_{f}) .

(23)

The definition of the 1^st-level adjoint sensitivity function

a^{(1)} (t)

is now completed by requiring that it satisfy (adjoint) “boundary conditions at the final time”

t = t_{f}

so as to eliminate the term containing the unknown values

v^{(1)} (t_{f})

in Eq. (22), which is achieved by requiring that

a^{(1)} (t_{f}) = 0 .

(24)

It is evident from Eqs. (23) and (24) that to each response

h_{n}^{d} [h (t); φ]

, there will correspond a distinct 1^st-level adjoint sensitivity function

a^{(1)} (n; t)

n = 1, ..., T R

. Since it is important to highlight this characteristic, the correspondence/dependence of the 1^st-level adjoint sensitivity function to/on the individual response under consideration will be made explicit by re-writing Eqs. (23) and (24) as follows:

\frac{d a^{(1)} (n; t)}{d t} = - {[J (h; F)]}^{†} a^{(1)} (n; t) - \frac{\partial h_{n}^{d} [h (t); φ]}{\partial h (t)} δ (t - t_{f});

(25)

a^{(1)} (n; t_{f}) = 0 .

(26)

The system of equations comprising Eqs. (25) and (26) constitute the “1^st-Level Adjoint Sensitivity System (1^st-LASS)” for the 1^st-level adjoint function

a^{(1)} (n; t)

, for the response

r_{n} (h; φ)

. Evidently, the 1^st-LASS is linear in

a^{(1)} (n; t)

and is independent of parameter variations, which implies that it needs to be solved just once to obtain the 1^st-level adjoint function

a^{(1)} (n; t)

. Notably, the left-side of the 1^st-LASS has the same as form as left-side of the “adjoint equations” used for training the NODE, but the “source”

\partial h_{n}^{d} [h (t); φ] / \partial h (t) δ (t - t_{f})

on the right-side of the 1^st-LASS stems from the “response” under consideration, while the “source” on the right-side of the “adjoint equations for training the NODE” stems from “loss functional” used in training the NODE. In component form, the 1^st-LASS has the following structure for

i = 1, ..., T H

\frac{d a_{i}^{(1)} (n; t)}{d t} = - \sum_{j = 1}^{T H} \frac{\partial f_{j}}{\partial h_{i}} a_{j}^{(1)} (n; t) - \frac{\partial h_{n}^{d} [h (t); φ]}{\partial h_{i} (t)} δ (t - t_{f});

(27)

a_{i}^{(1)} (n; t_{f}) = 0 .

(28)

Using the results represented by Eqs. (23), (24), (21), and (11) in Eq. (22) yields the following alternative expression for the “indirect-effect” term, which does not involve the 1^st-level variational sensitivity function

v^{(1)} (t)

but involves the 1^st-level adjoint function

a^{(1)} (n; t)

{\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} = {〈a^{(1)} (n; t), Q^{(1)} (h; F) δ F〉}_{1} + {[a^{(1)} (n; t_{0})]}^{†} v^{(1)} (t_{0}) .

(29)

Using in Eq. (29) the expression provided for

v^{(1)} (t_{0})

in Eq. (15) yields the following expression for the “indirect-effect” term:

\begin{array}{l} {\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} = {〈a^{(1)} (n; t), Q^{(1)} (h; F) δ F〉}_{1} \\ + {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x} δ x + {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w} δ w . \end{array}

(30)

Replacing the expression obtained in Eq. (29) for the “indirect-effect term” together with the expression of the direct-effect term provided by Eq. (10) into Eq. (9) yields the following expression (which is to be evaluated at the nominal values of all functions and parameters/weights) for the first-order G-variation

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

of the response

r_{n} (h; φ)

\begin{array}{l} δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ) = \frac{\partial h_{n}^{d} [h (t_{f}); φ]}{\partial φ} δ φ + {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w} δ w \\ + {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x} δ x + \int_{t_{0}}^{t_{f}} {[a^{(1)} (n; t)]}^{†} Q^{(1)} (h; F) δ F d t; n = 1, ..., T R . \end{array}

(31)

As indicated by the right-side of Eq. (31), the sensitivities of the response

r_{n} (h; φ)

are provided by the following expressions, all of which are to be evaluated at the nominal values of all functions and parameters/weights:

\frac{\partial r_{n}}{\partial F_{i}} = \int_{t_{0}}^{t_{f}} {[a^{(1)} (n; t)]}^{†} \frac{\partial f (h; F)}{\partial F_{i}} d t = \sum_{j = 1}^{T H} \int_{t_{0}}^{t_{f}} a_{j}^{(1)} (n; t) \frac{\partial f_{j} (h; F)}{\partial F_{i}} d t; i = 1, ..., T F; n = 1, ..., T R;

(32)

\frac{\partial r_{n}}{\partial φ_{i}} = \frac{\partial h_{n}^{d} [h (t_{f}); φ]}{\partial φ_{i}}; i = 1, ..., T D; ​ ​ ​ ​ n = 1, ..., T R;

(33)

\frac{\partial r_{n}}{\partial w_{i}} = {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w_{i}} = \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial w_{i}}; i = 1, ..., T E W; n = 1, ..., T R;

(34)

\frac{\partial r_{n}}{\partial x_{i}} = {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x_{i}} = \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial x_{i}}; i = 1, ..., T I; n = 1, ..., T R;

(35)

The expressions of the first-order sensitivities obtained in Eqs. (32)‒(35) correspond to the n^th-response,

r_{n}

, where

n = 1, ..., T R

. The first-order sensitivities of the response with respect to the primary parameters/weights

θ ≜ {[θ_{1}, ..., θ_{T W}]}^{†}

are obtained analytically by applying the “chain-rule” to the expression in Eq. (32), which yields:

\frac{\partial r_{n}}{\partial θ_{j}} = \sum_{i = 1}^{T F} \frac{\partial r_{n}}{\partial F_{i}} \frac{\partial F_{i}}{\partial θ_{j}}, j = 1, ..., T W; n = 1, ..., T R .

(36)

When there are no feature functions, it follows that

F_{i} (θ) \equiv θ_{i}

, for all

i = 1, ..., T W \equiv T F

, so the expression obtained in Eq. (32) yields directly the first-order sensitivities

\partial r_{n} / \partial θ_{j}

, for all

j = 1, ..., T W; n = 1, ..., T R

3. Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2^nd-FASAM-NODE)

The second-order sensitivities of the response

r_{n} (h; φ)

defined in Eq. (4) are obtained by computing the first-order sensitivities of the expressions obtained in Eqs. (32)‒(35), which represent the first-order sensitivities of the response

r_{n} (h; φ)

with respect to the various parameters (weights) and initial conditions (input). In other words, the second-order sensitivities of

r_{n} (h; φ)

will be computed by conceptually using their basic definitions as being the “first-order sensitivities of the first-order sensitivities.”

3.1. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial F_{i}$ , $i = 1, ..., T F$ ; $n = 1, ..., T R$

The second-order sensitivities stemming from the first-order sensitivities

\partial r_{n} / \partial F_{i}

are obtained from the first-order G-differential of Eq. (32), for

i = 1, ..., T F

;

n = 1, ..., T R

, as follows:

\begin{array}{l} δ (\frac{\partial r_{n}}{\partial F_{i}}) ≜ {\{\frac{d}{d ε} [\int_{t_{0}}^{t_{f}} {[a^{(1, 0)} (n; t) + ε δ a^{(1)} (n; t)]}^{†} \frac{\partial}{\partial F_{i}} f [h^{0} + ε v^{(1)}; F (θ^{0}) + ε δ F; t] d t]\}}_{ε = 0} \\ ≜ δ {(\frac{\partial r_{n}}{\partial F_{i}})}_{d i r} + δ {(\frac{\partial r_{n}}{\partial F_{i}})}_{i n d} . \end{array}

(37)

In Eq. (37), the direct-effect term

δ {(\partial r_{n} / \partial F_{i})}_{d i r}

comprises the variation

δ F

(stemming from the model parameters) and is defined as follows:

δ {(\frac{\partial r_{n}}{\partial F_{i}})}_{d i r} ≜ \int_{t_{0}}^{t_{f}} {[a^{(1)} (n; t)]}^{†} \frac{\partial^{2} f (h; F)}{\partial F \partial F_{i}} δ F d t = \sum_{j = 1}^{T H} \int_{t_{0}}^{t_{f}} a_{j}^{(1)} (n; t) \sum_{k = 1}^{T F} \frac{\partial^{2} f_{j} (h; F)}{\partial F_{k} \partial F_{i}} δ F_{k} d t;

(38)

while the indirect-effect term

δ {(\partial r_{n} / \partial F_{i})}_{i n d}

comprises the variations

v^{(1)} (t)

and

δ a^{(1)} (t)

, and is defined as follows:

\begin{array}{l} δ {(\frac{\partial r_{n}}{\partial F_{i}})}_{i n d} ≜ \int_{t_{0}}^{t_{f}} {[δ a^{(1)} (n; t)]}^{†} \frac{\partial f (h; F)}{\partial F_{i}} d t + \int_{t_{0}}^{t_{f}} {[a^{(1)} (n; t)]}^{†} [\frac{\partial^{2} f (h; F)}{\partial h \partial F_{i}} v^{(1)} (t)] d t \\ = \sum_{j = 1}^{T H} \int_{t_{0}}^{t_{f}} δ a_{j}^{(1)} (n; t) \frac{\partial f_{j} (h; F)}{\partial F_{i}} d t + \sum_{j = 1}^{T H} \sum_{k = 1}^{T F} \int_{t_{0}}^{t_{f}} a_{j}^{(1)} (n; t) \frac{\partial^{2} f_{j} (h; F)}{\partial h_{k} \partial F_{i}} v_{k}^{(1)} (t) d t; i = 1, ..., T F . \end{array}

(39)

The expressions in Eqs. (38) and (39) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (namely: the superscript “zero”) has been omitted in order to simplify the notation.

The direct-effect term can be evaluated at this time for all variations

δ F

, but the indirect-effect term can be evaluated only after having determined the variations

v^{(1)} (t)

and

δ a^{(1)} (n; t)

. The variation

v^{(1)} (t)

is the solution of the 1^st-LVSS defined by Eqs. (14) and (15). On the other hand, the variational function

δ a^{(1)} (n; t)

is the solution of the system of equations obtained by G-differentiating the 1^st-LASS. This differentiation is performed most transparently by applying the definition of the G-differential to the 1^st-LASS presented in component form in Eqs. (27) and (28), to obtain the following system of equations for

j = 1, ..., T H

\begin{array}{l} {\{\frac{d}{d t} \frac{d}{d ε} [a_{j}^{(1, 0)} (n; t) + ε δ a_{j}^{(1)} (n; t)]\}}_{ε = 0} = - \sum_{k = 1}^{T H} \{\frac{d}{d ε} \frac{\partial f_{k} (h^{0} + ε v^{(1)}; F^{0} + ε δ F)}{\partial h_{j}} \\ {\times [a_{k}^{(1, 0)} (n; t) + ε δ a_{k}^{(1)} (n; t)]\}}_{ε = 0} - {\{\frac{d}{d ε} \frac{\partial h_{n}^{d} [h^{0} + ε v^{(1)}; φ^{0} + ε δ φ]}{\partial h_{j} (t)}\}}_{ε = 0} δ (t - t_{f}); \end{array}

(40)

{\{\frac{d}{d ε} [a_{j}^{(1, 0)} (n; t_{f}) + ε δ a_{j}^{(1)} (n; t_{f})]\}}_{ε = 0} = 0; j = 1, ..., T H .

(41)

Performing the operations indicated in Eqs. (40) and (41) yields the following relations for

j = 1, ..., T H

\begin{array}{l} \frac{d}{d t} δ a_{j}^{(1)} (n; t) = - \sum_{k = 1}^{T H} \frac{\partial f_{k} (h; F)}{\partial h_{j}} δ a_{k}^{(1)} (n; t) - \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t) \sum_{l = 1}^{T H} \frac{\partial^{2} f_{k} (h; F)}{\partial h_{l} \partial h_{j}} v_{l}^{(1)} (t) \\ - \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t) \sum_{l = 1}^{T F} \frac{\partial^{2} f_{k} (h; F)}{\partial F_{l} \partial h_{k}} δ F_{l} - \sum_{l = 1}^{T H} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial h_{l} \partial h_{j} (t)} v_{l}^{(1)} (t) δ (t - t_{f}) \\ - \sum_{l = 1}^{T D} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ_{l} \partial h_{j} (t)} δ φ_{l} δ (t - t_{f}); \end{array}

(42)

δ a_{j}^{(1)} (n; t_{f}) = 0.

(43)

The summations in Eq. (42) are rearranged as follows, for each

j = 1, ..., T H

, and

n = 1, ..., T R

\frac{d}{d t} δ a_{j}^{(1)} (n; t) = - \sum_{l = 1}^{T H} g_{l j}^{n} v_{l}^{(1)} (t) - \sum_{k = 1}^{T H} \frac{\partial f_{k} (h; F)}{\partial h_{j}} δ a_{k}^{(1)} (n; t) - p_{j}^{(2, n)},

(44)

where the following definitions were used:

\begin{array}{l} g_{l j}^{n} = \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t) \frac{\partial^{2} f_{k} (h; F)}{\partial h_{l} \partial h_{j}} + \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial h_{l} \partial h_{j} (t)} δ (t - t_{f}) \\ p_{j}^{(2, n)} ≜ \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t) \sum_{l = 1}^{T F} \frac{\partial^{2} f_{k} (h; F)}{\partial F_{l} \partial h_{j}} δ F_{l} + \sum_{l = 1}^{T D} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ_{l} \partial h_{j} (t)} δ φ_{l} δ (t - t_{f}); \end{array}

(45)

In matrix-format, Eq. (44) can be written as follows:

\frac{d}{d t} [δ a^{(1)} (n; t)] = - G^{(n)} v^{(1)} (t) - J^{†} δ a^{(1)} (n; t) - p^{(2, n)},

(46)

where the following definitions were used:

G^{(n)} ≜ [\begin{matrix} g_{11}^{n} & \cdot & g_{1, T H}^{n} \\ \cdot & \cdot & \cdot \\ g_{T H, 1}^{n} & \cdot & g_{T H, T H}^{n} \end{matrix}]; p^{(2, n)} ≜ {[p_{1}^{(2, n)}, ..., p_{T H}^{(2, n)}]}^{†} .

(47)

Concatenating Eqs. (14) and (46) yields the following 2^nd-Level Variational Sensitivity System (2^nd-LVSS) for the 2^nd-level variational function

v^{(2)} (n; t) ≜ {[v^{(1)} (t), δ a^{(1)} (n; t)]}^{†}

\frac{d v^{(2)} (n; t)}{d t} = V^{(2, n)} v^{(2)} (n; t) + q^{(2, n)} ​, n = 1, ..., T R,

(48)

where the superscript “2” indicates “second-level” and where the following definitions were used:

V^{(2, n)} ≜ (\begin{matrix} J & 0 \\ - G^{(n)} & - J^{†} \end{matrix}); q^{(2, n)} ≜ {[Q^{(1)} δ F; - p^{(2, n)}]}^{†} .

(49)

The matrix “

0

” in Eq. (49) denotes a

T H \times T H

matrix having zero-elements. The initial-time and final-time conditions satisfied by

v^{(2)} (n; t) ≜ {[v^{(1)} (t), δ a^{(1)} (n; t)]}^{†}

are provided in Eqs. (15) and (43), which can be written in vector form as follows:

v^{(1)} (t_{0}) = \frac{\partial h_{e} (x, w)}{\partial x} δ x + \frac{\partial h_{e} (x, w)}{\partial w} δ w; δ a^{(1)} (n; t_{f}) = 0 .

(50)

It is impractical to solve the 2^nd-LVSS repeatedly in order to compute each 2^nd-level variational function

v^{(2)} (n; t) ≜ {[v^{(1)} (t), δ a^{(1)} (n; t)]}^{†}

that would be needed for every component of the variations

δ F

and

δ φ

. The need for computing

v^{(2)} (n; t)

repeatedly can be circumvented by applying the principles of the 2^nd-FASAM-N methodology [17] which comprises the following sequence of steps:

Consider that $v^{(2)} (n; t) ≜ {[v^{(1)} (t), δ a^{(1)} (n; t)]}^{†}$ is an element in a Hilbert space denoted as $H_{2} (Ω_{t})$ , $Ω_{t} ≜ [t_{0}, t_{f}]$ , comprising as elements 2-block vectors having the following structure: $χ^{(2)} ≜ {[χ_{1}^{(2)} (t), χ_{2}^{(2)} (t)]}^{†}$ , with $χ_{1}^{(2)} (t) ≜ {[χ_{1, 1}^{(2)} (t), \dots, χ_{1, T H}^{(2)} (t)]}^{†}$ and $χ_{2}^{(2)} (t) ≜ {[χ_{2, 1}^{(2)} (t), \dots, χ_{2, T H}^{(2)} (t)]}^{†}$ . The Hilbert space $H_{2} (Ω_{t})$ is considered to be endowed with an inner product denoted as ${〈χ^{(2)}, η^{(2)}〉}_{2}$ , between two vectors $χ^{(2)} ≜ {[χ_{1}^{(2)} (t), χ_{2}^{(2)} (t)]}^{†} \in H_{2} (Ω_{t})$ and $η^{(2)} = {[η_{1}^{(2)} (t), η_{2}^{(2)} (t)]}^{†} \in H_{2} (Ω_{t})$ , with $η_{1}^{(2)} (t) ≜ {[η_{1, 1}^{(2)} (t), \dots, η_{1, T H}^{(2)} (t)]}^{†}$ , $η_{2}^{(2)} (t) ≜ {[η_{2, 1}^{(2)} (t), \dots, η_{2, T H}^{(2)} (t)]}^{†}$ , which is defined as follows:

${〈χ^{(2)}, η^{(2)}〉}_{2} ≜ \int_{t_{0}}^{t_{f}} {[χ^{(2)} (t)]}^{†} η^{(2)} (t) d t = \sum_{j = 1}^{T H} \int_{t_{0}}^{t_{f}} χ_{1, j}^{(2)} (t) η_{1, j}^{(2)} (t) d t + \sum_{j = 1}^{T H} \int_{t_{0}}^{t_{f}} χ_{2, j}^{(2)} (t) η_{2, j}^{(2)} (t) d t .$

(51)

The scalar product

{〈χ^{(2)}, η^{(2)}〉}_{2}

is required to hold in a neighborhood of the nominal values

(h^{0}; x^{0}; F^{0}; w^{0}; φ^{0})

2.: Use the definition of the inner product provided in Eq. (51) to form the inner product of Eq. (48) with a vector $a^{(2)} (n; t) ≜ {[a_{1}^{(2)} (n; t), a_{2}^{(2)} (n; t)]}^{†} \in H_{2} (Ω_{t})$ , with $a_{1}^{(2)} (n; t) ≜ {[a_{1, 1}^{(2)} (n; t), \dots, a_{1, T H}^{(2)} (n; t)]}^{†}$ and $a_{2}^{(2)} (n; t) ≜ {[a_{2, 1}^{(2)} (n; t), \dots, a_{2, T H}^{(2)} (n; t)]}^{†}$ , where the superscript “(2)” indicates “2^nd-level”, to obtain the following relationship:

${〈a^{(2)} (n; t), \frac{d v^{(2, n)} (t)}{d t} - V^{(2, n)} v^{(2, n)} (t)〉}_{2} = {〈a^{(2)} (n; t), q^{(2, n)}〉}_{2} .$

(52)
3.: Using the definition of the adjoint operator in $H_{2} (Ω_{t})$ , the left-side of Eq. (52) is transformed as follows, after integrating by parts over the independent variable $t$ :

$\begin{array}{l} \int_{t_{0}}^{t_{f}} {[a_{1}^{(2)} (n; t)]}^{†} \frac{d v^{(1)} (t)}{d t} d t - \int_{t_{0}}^{t_{f}} {[a_{1}^{(2)} (n; t)]}^{†} J v^{(1)} (t) d t + \int_{t_{0}}^{t_{f}} {[a_{2}^{(2)} (n; t)]}^{†} \frac{d}{d t} [δ a^{(1)} (t)] d t \\ + \int_{t_{0}}^{t_{f}} {[a_{2}^{(2)} (n; t)]}^{†} G^{(n)} v^{(1)} (t) d t + \int_{t_{0}}^{t_{f}} {[a_{2}^{(2)} (n; t)]}^{†} J^{†} δ a^{(1)} (t) d t = {[a_{1}^{(2)} (n; t_{f})]}^{†} v^{(1)} (t_{f}) \\ - {[a_{1}^{(2)} (n; t_{0})]}^{†} v^{(1)} (t_{0}) + \int_{t_{0}}^{t_{f}} {[v^{(1)} (t)]}^{†} \{- \frac{d a_{1}^{(2)} (n; t)}{d t} - J^{†} a_{1}^{(2)} (n; t) + {[G^{(n)}]}^{†} a_{2}^{(2)} (n; t)\} d t \\ + {[a_{2}^{(2)} (n; t_{f})]}^{†} δ a^{(1)} (n; t_{f}) - {[a_{2}^{(2)} (n; t_{0})]}^{†} δ a^{(1)} (n; t_{0}) \\ + \int_{t_{0}}^{t_{f}} {[δ a^{(1)} (n; t)]}^{†} \{- \frac{d a_{2}^{(2)} (n; t)}{d t} + J a_{2}^{(2)} (n; t)\} d t . \end{array}$

(53)
4.: The two integral terms on the right-side of Eq. (53) are now required to represent the “indirect-effect” term defined in Eq. (39), which is achieved by imposing the following requirements:

$\begin{array}{l} - \frac{d}{d t} a_{1}^{(2)} (i; n; t) - J^{†} a_{1}^{(2)} (i; n; t) + {[G^{(n)}]}^{†} a_{2}^{(2)} (i; n; t) = {[\frac{\partial^{2} f (h; F)}{\partial h \partial F_{i}}]}^{†} a^{(1)} (n; t); \\ i = 1, ..., T F; \end{array}$

(54)

$- \frac{d}{d t} a_{2}^{(2)} (i; n; t) + J a_{2}^{(2)} (i; n; t) = \frac{\partial f (h; F)}{\partial F_{i}}; i = 1, ..., T F .$

(55)

Note that the right-sides of Eqs. (54) and (55) also depend on the index

i = 1, ..., T F

, which means that a distinct 2^nd-level adjoint sensitivity function will correspond to each value of the index

i = 1, ..., T F

. This fact has been explicitly indicated by including the index

i = 1, ..., T F

in the list of arguments of the components of the 2^nd-level adjoint sensitivity function

a^{(2)} (i; n; t) ≜ {[a_{1}^{(2)} (i; n; t), a_{2}^{(2)} (i; n; t)]}^{†}

5.: The definition of the 2^nd-level adjoint sensitivity function $a^{(2)} (i; n; t) ≜ {[a_{1}^{(2)} (i; n; t), a_{2}^{(2)} (i; n; t)]}^{†}$ is now completed by requiring it to satisfy the following boundary conditions, which eliminates the respective unknown terms on the right-side of Eq. (53)

$a_{1}^{(2)} (i; n; t_{f}) = 0; a_{2}^{(2)} (i; n; t_{0}) = 0 .$

(56)

The system of equations comprising Eqs. (54)‒(56) constitutes the “2^nd-Level Adjoint Sensitivity System (2^nd-LASS)” for the 2^nd-level adjoint sensitivity function

a^{(2)} (i; n; t) ≜ {[a_{1}^{(2)} (i; n; t), a_{2}^{(2)} (i; n; t)]}^{†}

. Evidently, the 2^nd-LASS is linear in

a^{(2)} (i; n; t) ≜ {[a_{1}^{(2)} (i; n; t), a_{2}^{(2)} (i; n; t)]}^{†}

and is independent of parameter variations. Notably, this system of equations does not need to be solved simultaneously, but can be solved sequentially, by first solving Eq. (55) subject to the initial condition given in Eq. (56) to determine the function

a_{2}^{(2)} (i; n; t)

, and subsequently using the function

a_{2}^{(2)} (i; n; t)

in Eq. (54) to solve it subject to the “final-time” condition given in Eq. (56) to obtain the function

a_{1}^{(2)} (i; n; t)

. The 2^nd-LASS is to be solved using the nominal values of all functions and parameters/weights, but the superscript “zero” has been omitted for notational simplicity.

6.: Using the results obtained in Eqs. (52), (54)‒(56) in Eq. (53) yields the following alternative expression for the “indirect-effect” term $δ {(\partial r_{n} / \partial F_{i})}_{i n d}$ , which no longer involves the 2^nd-level variational sensitivity function $v^{(2, n)} (t)$ but involves the 2^nd-level adjoint sensitivity function $a^{(2)} (i; n; t) ≜ {[a_{1}^{(2)} (i; n; t), a_{2}^{(2)} (i; n; t)]}^{†}$ :

$\begin{array}{l} δ {(\frac{\partial r_{n}}{\partial F_{i}})}_{i n d} = {[a_{1}^{(2)} (i; n; t_{0})]}^{†} v^{(1)} (t_{0}) + {〈a^{(2)} (i; n; t), q^{(2, n)} 〉}_{2} \\ = {[a_{1}^{(2)} (i; n; t_{0})]}^{†} v^{(1)} (t_{0}) + \int_{t_{0}}^{t_{f}} {[a_{1}^{(2)} (i; n; t)]}^{†} Q^{(1)} δ F d t - \int_{t_{0}}^{t_{f}} {[a_{2}^{(2)} (i; n; t)]}^{†} p^{(2, n)} d t . \end{array}$

(57)

7.: Using in Eq. (57) the expression provided for $v^{(1)} (t_{0})$ in Eq. (15), and adding the resulting expression for the indirect-effect term $δ {(\partial r_{n} / \partial F_{i})}_{i n d}$ to the expression for the direct-effect term $δ {(\partial r_{n} / \partial F_{i})}_{d i r}$ provided in Eq. (38), yields the following expression for the total first-order G-differential $δ (\partial r_{n} / \partial F_{i})$ , for each $n = 1, ..., T R$ :

$\begin{array}{l} δ (\frac{\partial r_{n}}{\partial F_{i}}) = \int_{t_{0}}^{t_{f}} {[a^{(1)} (i; n; t)]}^{†} \frac{\partial^{2} f (h; F)}{\partial F \partial F_{i}} δ F d t + {[a_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x} δ x \\ + {[a_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w} δ w + \int_{t_{0}}^{t_{f}} {[a_{1}^{(2)} (i; n; t)]}^{†} Q^{(1)} δ F d t - \int_{t_{0}}^{t_{f}} {[a_{2}^{(2)} (i; n; t)]}^{†} p^{(2, n)} (t) d t \\ ≜ \frac{\partial^{2} r_{n}}{\partial F \partial F_{i}} δ F + \frac{\partial^{2} r_{n}}{\partial x \partial F_{i}} δ x + \frac{\partial^{2} r_{n}}{\partial w \partial F_{i}} δ w + \frac{\partial^{2} r_{n}}{\partial φ \partial F_{i}} δ φ \\ = \sum_{j = 1}^{T F} \frac{\partial^{2} r_{n}}{\partial F_{j} \partial F_{i}} δ F_{j} + \sum_{j = 1}^{T I} \frac{\partial^{2} r_{n}}{\partial x_{j} \partial F_{i}} δ x_{j} + \sum_{j = 1}^{T E W} \frac{\partial^{2} r_{n}}{\partial w_{j} \partial F_{i}} δ w_{j} + \sum_{j = 1}^{T D} \frac{\partial^{2} r_{n}}{\partial φ_{j} \partial F_{i}} δ φ_{j}; i = 1, ..., T F . \end{array}$

(58)

The expression shown in Eq. (58) is to be evaluated at the nominal values of all functions and parameters/weights but the superscript “zero” (which has been used to indicate this fact) has been omitted for notational simplicity.

Identifying in Eq. (58) the expressions that multiply the individual variations

δ F

δ x

δ w

and

δ φ

yields the following expressions for the second-order sensitivities that stem from the first-order sensitivity

\partial r_{n} / \partial F_{i}

for

n = 1, ..., T R

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial F_{j} \partial F_{i}} = \sum_{k = 1}^{T H} \int_{t_{0}}^{t_{f}} a_{k}^{(1)} (n; t) \frac{\partial^{2} f_{k} (h; F)}{\partial F_{j} \partial h_{i}} d t + \sum_{k = 1}^{T H} \int_{t_{0}}^{t_{f}} a_{1, k}^{(2)} (i; n; t) \frac{\partial f_{k} (h; F)}{\partial F_{j}} d t \\ - \sum_{m = 1}^{T H} \int_{t_{0}}^{t_{f}} a_{2, m}^{(2)} (i; n; t) \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t) \frac{\partial^{2} f_{k} (h; F)}{\partial F_{j} \partial h_{m}} d t; ​ ​ ​ ​ ​ ​ ​ ​ i, j = 1, ..., T F; n = 1, ..., T R; \end{array}

(59)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial x_{j} \partial F_{i}} = {[a_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x_{j}} = \sum_{k = 1}^{T H} a_{1, k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial x_{j}}; \\ i = 1, ...., T F; j = 1, ...., T I; n = 1, ..., T R; \end{array}

(60)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial w_{j} \partial F_{i}} = {[a_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w_{j}} = \sum_{k = 1}^{T H} a_{1, k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial w_{j}}; \\ i = 1, ...., T F; j = 1, ...., T E W; n = 1, ..., T R; \end{array}

(61)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial φ_{j} \partial F_{i}} = - \int_{t_{0}}^{t_{f}} {[a_{2}^{(2)} (i; n; t)]}^{†} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ_{j} \partial h} δ (t - t_{f}) d t \\ = - \sum_{k = 1}^{T H} a_{1, k}^{(2)} (i; n; t_{f}) \frac{\partial^{2} h_{n}^{d} [h (t_{f}); φ]}{\partial φ_{j} \partial h_{k} (t_{f})}; i = 1, ...., T F; j = 1, ...., T D; n = 1, ..., T R . \end{array}

(62)

The second-order sensitivities of the responses with respect to the primary parameters/weights

θ ≜ {[θ_{1}, ..., θ_{T W}]}^{†}

are obtained analytically by using the results obtained in Eqs. (59)‒(62) in conjunctions with the “chain-rule”:

\frac{\partial^{2} r_{n}}{\partial θ_{k} \partial θ_{j}} = \frac{\partial}{\partial θ_{k}} [\sum_{i = 1}^{T F} \frac{\partial r_{n}}{\partial F_{i}} \frac{\partial F_{i}}{\partial θ_{j}}]; j, k = 1, ..., T W; n = 1, ..., T R .

(63)

When there are no feature functions, the second-order sensitivities are obtained by setting

F_{i} (θ) \equiv θ_{i}

, for all

i = 1, ..., T W \equiv T F

in Eqs. (59)‒(62).

3.2. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial φ_{i}$ , $i = 1, ..., T D$ ; $n = 1, ..., T R$

The second-order sensitivities stemming from the first-order sensitivities

\partial r_{n} / \partial φ_{i}

are obtained from the first-order G-differential of Eq. (33), for

i = 1, ..., T D

;

n = 1, ..., T R

, as follows:

\begin{array}{l} δ (\frac{\partial r_{n}}{\partial φ_{i}}) ≜ {\{\frac{\partial}{\partial φ_{i}} \frac{d}{d ε} [\int_{t_{0}}^{t_{f}} h_{n}^{d} [h^{0} (t) + ε v^{(1)} (t); φ^{0} + ε δ φ] δ (t - t_{f}) d t]\}}_{ε = 0} \\ ≜ δ {(\frac{\partial r_{n}}{\partial φ_{i}})}_{d i r} + δ {(\frac{\partial r_{n}}{\partial φ_{i}})}_{i n d}, \end{array}

(64)

where the direct-effect term

δ {(\partial r_{n} / \partial φ_{i})}_{d i r}

comprises the variation

δ φ

and is defined as follows:

δ {(\frac{\partial r_{n}}{\partial φ_{i}})}_{d i r} ≜ \int_{t_{0}}^{t_{f}} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ \partial φ_{i}} δ φ δ (t - t_{f}) d t = \sum_{k = 1}^{T H} \int_{t_{0}}^{t_{f}} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ_{k} \partial φ_{i}} δ (t - t_{f}) δ φ_{k} d t,

(65)

and where the indirect-effect term

δ {(\partial r_{n} / \partial φ_{i})}_{i n d}

comprises the variation

v^{(1)} (t)

and is defined as follows:

δ {(\frac{\partial r_{n}}{\partial φ_{i}})}_{i n d} ≜ \int_{t_{0}}^{t_{f}} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial h \partial φ_{i}} v^{(1)} (t) δ (t - t_{f}) d t = \sum_{k = 1}^{T H} \int_{t_{0}}^{t_{f}} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial h_{k} \partial φ_{i}} v_{k}^{(1)} (t) δ (t - t_{f}) d t .

(66)

The expressions shown in Eqs. (65) and (66) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (namely: the superscript “zero”) has been omitted in order to simplify the notation.

The direct-effect term

δ {(\partial r_{n} / \partial φ_{i})}_{d i r}

can be evaluated at this time for all variations

δ φ

, but the indirect-effect term

δ {(\partial r_{n} / \partial φ_{i})}_{i n d}

can be evaluated only after having determined the variation

v^{(1)} (t)

, which is the solution of the 1^st-LVSS defined by Eqs. (14) and (15). The need for solving this 1^st-LVSS can be avoided if the indirect-effect term defined in Eq. (66) could be expressed in terms of a “right-hand side” that does not involve the function

v^{(1)} (t)

. This goal can be achieved by applying the same concepts and steps as in Section 2, as follows:

Using Eq. (20), form the inner product of Eq. (14) with a vector $b^{(2)} (t) ≜ {[b_{1}^{(2)} (t), \dots, b_{T H}^{(2)} (t)]}^{†} \in H_{1} (Ω_{t})$ , where the superscript “(2)” indicates “2^nd-Level”, to obtain the following relationship:

${〈b^{(2)} (t), \frac{d v^{(1)} (t)}{d t} - J (h; F) v^{(1)} (t)〉}_{1} = {〈b^{(2)} (t), Q^{(1)} (h; F) δ F〉}_{1} .$

(67)
Using the definition of the adjoint operator in $H_{1} (Ω_{t})$ , the left-side of Eq. (21) is integrated by parts over the independent variable $t$ to obtain the following relation:

$\begin{array}{l} \int_{t_{0}}^{t_{f}} {[b^{(2)} (t)]}^{†} \frac{d v^{(1)} (t)}{d t} d t - \int_{t_{0}}^{t_{f}} {[b^{(2)} (t)]}^{†} J (h; F) v^{(1)} (t) d t = {[b^{(2)} (t_{f})]}^{†} v^{(1)} (t_{f}) \\ - {[b^{(2)} (t_{0})]}^{†} v^{(1)} (t_{0}) + \int_{t_{0}}^{t_{f}} {[v^{(1)} (t)]}^{†} \{- \frac{d b^{(2)} (t)}{d t} - {[J (h; F)]}^{†} b^{(2)} (t)\} d t . \end{array}$

(68)
The last term on the right-side of Eq. (68) is now required to represent the “indirect-effect” term defined in Eq. (66), which is achieved by requiring that the 2^nd-level adjoint sensitivity function $b^{(2)} (t)$ satisfy the following relation written in NODE-format:

$\frac{d}{d t} b^{(2)} (i; n; t) = - {[J (h; F)]}^{†} b^{(2)} (i; n; t) - \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial h \partial φ_{i}} δ (t - t_{f}) .$

(69)

It is evident from Eq. (69) that to each response

\partial^{2} h_{n}^{d} (h; φ) / \partial h \partial φ_{i}

, there will correspond a distinct 2^nd-level adjoint sensitivity function

b^{(2)} (i; n; t)

, for each

i = 1, ..., T I

and

n = 1, ..., T R

. This important fact has been highlighted by adding the indices

i = 1, ..., T I

and

n = 1, ..., T R

in the list of arguments of

b^{(2)} (i; n; t)

4.: The definition of the 2^nd-level adjoint sensitivity function $b^{(2)} (i; n; t)$ is now completed by requiring it to eliminate the term containing the unknown values $v^{(1)} (t_{f})$ in Eq. (68), which is accomplished by requiring $b^{(2)} (i; n; t)$ to satisfy the following boundary condition at the final time $t = t_{f}$ :

$b^{(2)} (i; n; t_{f}) = 0 .$

(70)

The system of equations comprising Eqs. (69) and (70) constitute the “2^nd-Level Adjoint Sensitivity System (2^nd-LASS)” for the function

b^{(2)} (i; n; t)

; this 2^nd-LASS is linear in

b^{(2)} (i; n; t)

and is independent of parameter variations, which implies that it needs to be solved once to obtain the 2^nd-level adjoint sensitivity function

b^{(2)} (i; n; t)

, for each

i = 1, ..., T I

n = 1, ..., T R

. Notably, the left-side of this 2^nd-LASS has the same as form as left-side of the “adjoint equations” used for training the NODE, but the sources

\partial^{2} h_{n}^{d} (h; φ) / \partial h \partial φ_{i}

on the right-side of this 2^nd-LASS stem from each of the “responses” under consideration, while the “source” on the right-side of the “adjoint equations for training the NODE” stems from the “loss functional” used in training the NODE.

5.: Using the results obtained in Eqs. (69), (70), and (67) in Eq. (68) yields the following alternative expression for the “indirect-effect” term, which does not involve the 1^st-level variational sensitivity function $v^{(1)} (t)$ but instead involves the 2^nd level adjoint function $b^{(2)} (i; n; t)$ :

$\begin{array}{l} δ {(\partial r_{n} / \partial φ_{i})}_{i n d} = {〈b^{(2)} (i; n; t), \frac{\partial f (h; F)}{\partial F} δ F〉}_{1} \\ + {[b^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x} δ x + {[b^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w} δ w . \end{array}$

(71)
6.: Adding the expression obtained in Eq. (71) for the “indirect-effect term” together with the expression of the direct-effect term provided by Eq. (65) yields the following expression for the G-variation $δ (\partial r_{n} / \partial φ_{i})$ defined in Eq. (64), which is to be evaluated at the nominal values of all functions and parameters/weights, for $i = 1, ..., T I; n = 1, ..., T R$ :

$\begin{array}{l} δ (\frac{\partial r_{n}}{\partial φ_{i}}) = \int_{t_{0}}^{t_{f}} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ \partial φ_{i}} δ φ δ (t - t_{f}) d t + {[b^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x} δ x \\ + {[b^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w} δ w + \int_{t_{0}}^{t_{f}} b^{(2)} (i; n; t) \frac{\partial f (h; F)}{\partial F} δ F d t \\ ≜ \frac{\partial^{2} r_{n}}{\partial φ \partial φ_{i}} δ φ + \frac{\partial^{2} r_{n}}{\partial x \partial φ_{i}} δ x + \frac{\partial^{2} r_{n}}{\partial w \partial φ_{i}} δ w + \frac{\partial^{2} r_{n}}{\partial F \partial φ_{i}} δ F \\ = \sum_{j = 1}^{T F} \frac{\partial^{2} r_{n}}{\partial F_{j} \partial φ_{i}} δ F_{j} + \sum_{j = 1}^{T I} \frac{\partial^{2} r_{n}}{\partial x_{j} \partial φ_{i}} δ x_{j} + \sum_{j = 1}^{T E W} \frac{\partial^{2} r_{n}}{\partial w_{j} \partial φ_{i}} δ w_{j} + \sum_{j = 1}^{T D} \frac{\partial^{2} r_{n}}{\partial φ_{j} \partial φ_{i}} δ φ_{j}; i = 1, ..., T D . \end{array}$

(72)

As indicated by the rightmost-side of Eq. (72), the sensitivities of the response

r_{n} (h; φ)

are provided by the following expressions, all of which are to be evaluated at the nominal values of all functions and parameters/weights:

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial F_{j} \partial φ_{i}} = \int_{t_{0}}^{t_{f}} {[b^{(2)} (i; n; t)]}^{†} \frac{\partial f (h; F)}{\partial F_{j}} d t = \sum_{k = 1}^{T H} \int_{t_{0}}^{t_{f}} b_{k}^{(2)} (i; n; t) \frac{\partial f_{k} (h; F)}{\partial F_{j}} d t; \\ j = 1, ..., T F; i = 1, ..., T D; n = 1, ..., T R; \end{array}

(73)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial x_{j} \partial φ_{i}} = {[b^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x_{j}} = \sum_{k = 1}^{T H} b_{k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial x_{j}}; \\ j = 1, ..., T I; i = 1, ..., T D; n = 1, ..., T R . \end{array}

(74)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial w_{j} \partial φ_{i}} = {[b^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w_{j}} = \sum_{k = 1}^{T H} b_{k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial w_{j}}; \\ j = 1, ..., T E W; i = 1, ..., T D; n = 1, ..., T R; \end{array}

(75)

\frac{\partial^{2} r_{n}}{\partial φ_{j} \partial φ_{i}} = \frac{\partial^{2} h_{n}^{d} [h (t_{f}); φ]}{\partial φ_{j} \partial φ_{i}}; j = 1, ..., T D; i = 1, ..., T D; ​ ​ ​ ​ n = 1, ..., T R .

(76)

The first-order sensitivities of the response with respect to the primary parameters/weights

θ ≜ {[θ_{1}, ..., θ_{T W}]}^{†}

are obtained analytically by applying the “chain-rule”

\partial [] / \partial θ_{m} = \sum_{μ = 1}^{T F} (\partial [] / \partial F_{μ}) (\partial F_{μ} / \partial θ_{m})

to the expression in Eq. (73). When there are no feature functions, then

F_{i} (θ) \equiv θ_{i}

, for all

i = 1, ..., T W \equiv T F

3.3. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial w_{i}$ , $i = 1, ..., T E W$ ; $n = 1, ..., T R$

The second-order sensitivities stemming from the first-order sensitivities

\partial r_{n} / \partial w_{i}

are obtained from the first-order G-differential of Eq. (34), for

i = 1, ..., T E W

;

n = 1, ..., T R

, as follows:

\begin{array}{l} δ (\frac{\partial r_{n}}{\partial w_{i}}) ≜ {\{\frac{d}{d ε} [{[a^{(1, 0)} (n; t_{0}) + ε δ a^{(1)} (n; t_{0})]}^{†} \frac{\partial h_{e} (x^{0} + ε δ x, w^{0} + ε δ w)}{\partial w_{i}}]\}}_{ε = 0} \\ ≜ δ {(\frac{\partial r_{n}}{\partial w_{i}})}_{d i r} + δ {(\frac{\partial r_{n}}{\partial w_{i}})}_{i n d}, i = 1, ..., T E W; n = 1, ..., T R, \end{array}

(77)

where the direct-effect term

δ {(\partial r_{n} / \partial w_{i})}_{d i r}

comprises the variations

δ x

and

δ w

, defined as follows:

δ {(\frac{\partial r_{n}}{\partial w_{i}})}_{d i r} ≜ {[a^{(1)} (n; t_{0})]}^{†} [\frac{\partial^{2} h_{e} (x, w)}{\partial x \partial w_{i}} δ x + \frac{\partial^{2} h_{e} (x, w)}{\partial w \partial w_{i}} δ w];

(78)

and where the indirect-effect term

δ {(\partial r_{n} / \partial w_{i})}_{i n d}

comprises the variation

δ a^{(1)} (n; t)

and is defined as follows:

δ {(\frac{\partial r_{n}}{\partial w_{i}})}_{i n d} ≜ \int_{t_{0}}^{t_{f}} {[δ a^{(1)} (n; t)]}^{†} \frac{\partial h_{e} (x, w)}{\partial w_{i}} δ (t - t_{0}) d t .

(79)

The expressions shown in Eqs. (78) and (79) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (namely: the superscript “zero”) has been omitted in order to simplify the notation.

The direct-effect term

δ {(\partial r_{n} / \partial w_{i})}_{d i r}

can be evaluated at this time for all variations

δ x

and

δ w

, but the indirect-effect term

δ {(\partial r_{n} / \partial w_{i})}_{i n d}

can be evaluated only after having determined the variation

δ a^{(1)} (n; t)

, which is the solution of the 2^nd-LVSS defined by Eqs. (48) and (50). The need for solving this 1^st-LVSS can be avoided if the indirect-effect term defined in Eq. (79) could be expressed in terms of a “right-hand side” that does not involve the function

δ a^{(1)} (n; t)

. This goal can be achieved by applying the same concepts and steps as in Section 3.1, as follows:

Use the definition of the inner product provided in Eq. (51) to form the inner product of Eq. (48) with a vector $c^{(2)} (n; t) ≜ {[c_{1}^{(2)} (n; t), c_{2}^{(2)} (n; t)]}^{†} \in H_{2} (Ω_{t})$ , with $c_{1}^{(2)} (n; t) ≜ {[c_{1, 1}^{(2)} (n; t), \dots, c_{1, T H}^{(2)} (n; t)]}^{†}$ and $c_{2}^{(2)} (n; t) ≜ {[c_{2, 1}^{(2)} (n; t), \dots, c_{2, T H}^{(2)} (n; t)]}^{†}$ , where the superscript “(2)” indicates “2^nd-level”, to obtain the following relationship:

${〈c^{(2)} (n; t), \frac{d v^{(2, n)} (t)}{d t} - V^{(2, n)} v^{(2, n)} (t)〉}_{2} = {〈c^{(2)} (n; t), q^{(2, n)}〉}_{2} .$

(80)
Using the definition of the adjoint operator in $H_{2} (Ω_{t})$ , the left-side of Eq. (80) is transformed as follows, after integrating by parts over the independent variable $t$ :

$\begin{array}{l} \int_{t_{0}}^{t_{f}} {[c_{1}^{(2)} (n; t)]}^{†} \frac{d v^{(1)} (t)}{d t} d t - \int_{t_{0}}^{t_{f}} {[c_{1}^{(2)} (n; t)]}^{†} J v^{(1)} (t) d t \\ + \int_{t_{0}}^{t_{f}} {[c_{2}^{(2)} (n; t)]}^{†} \frac{d}{d t} [δ a^{(1)} (n; t)] d t + \int_{t_{0}}^{t_{f}} {[c_{2}^{(2)} (n; t)]}^{†} G^{(n)} v^{(1)} (t) d t \\ + \int_{t_{0}}^{t_{f}} {[c_{2}^{(2)} (n; t)]}^{†} J^{†} δ a^{(1)} (n; t) d t = {[c_{1}^{(2)} (n; t_{f})]}^{†} v^{(1)} (t_{f}) \\ - {[c_{1}^{(2)} (n; t_{0})]}^{†} v^{(1)} (t_{0}) + {[c_{2}^{(2)} (n; t_{f})]}^{†} δ a^{(1)} (n; t_{f}) - {[c_{2}^{(2)} (n; t_{0})]}^{†} δ a^{(1)} (n; t_{0}) \\ + \int_{t_{0}}^{t_{f}} {[v^{(1)} (t)]}^{†} \{- \frac{d c_{1}^{(2)} (n; t)}{d t} - J^{†} c_{1}^{(2)} (n; t) + {[G^{(n)}]}^{†} c_{2}^{(2)} (n; t)\} d t \\ + \int_{t_{0}}^{t_{f}} {[δ a^{(1)} (n; t)]}^{†} \{- \frac{d c_{2}^{(2)} (n; t)}{d t} + J c_{2}^{(2)} (n; t)\} d t . \end{array}$

(81)
The two integral terms on the right-side of Eq. (81) are now required to represent the “indirect-effect” term defined in Eq. (79), which is achieved by imposing the following requirements on the components of the 2^nd-level adjoint sensitivity function $c^{(2)} (i; n; t) ≜ {[c_{1}^{(2)} (i; n; t), c_{2}^{(2)} (i; n; t)]}^{†}$ :

$- \frac{d}{d t} c_{1}^{(2)} (i; n; t) - J^{†} c_{1}^{(2)} (i; n; t) + {[G^{(n)}]}^{†} c_{2}^{(2)} (i; n; t) = 0;$

(82)

$- \frac{d}{d t} c_{2}^{(2)} (i; n; t) + J c_{2}^{(2)} (i; n; t) = \frac{\partial h_{e} (x, w)}{\partial w_{i}} δ (t - t_{0}) .$

(83)

Since the right-side of Eq. (83) also depends on the index

i = 1, ..., T F

, it follows that a distinct 2^nd-level adjoint sensitivity function will correspond to each distinct value of the index

i = 1, ..., T F

. This fact has been explicitly indicated by including the index

i = 1, ..., T F

in the list of arguments of the components of the 2^nd-level adjoint sensitivity function

c^{(2)} (i; n; t) ≜ {[c_{1}^{(2)} (i; n; t), c_{2}^{(2)} (i; n; t)]}^{†}

4.: The definition of the 2^nd-level adjoint sensitivity function $c^{(2)} (i; n; t) ≜ {[c_{1}^{(2)} (i; n; t), c_{2}^{(2)} (i; n; t)]}^{†}$ is now completed by requiring that it satisfy the following boundary conditions, which eliminates the respective unknown terms on the right-side of Eq. (53):

$c_{1}^{(2)} (i; n; t_{f}) = 0; c_{2}^{(2)} (i; n; t_{0}) = 0 .$

(84)

The system of equations comprising Eqs. (82)‒(84) constitutes the “2^nd-Level Adjoint Sensitivity System (2^nd-LASS)” for the 2^nd-level adjoint sensitivity function

c^{(2)} (i; n; t) ≜ {[c_{1}^{(2)} (i; n; t), c_{2}^{(2)} (i; n; t)]}^{†}

. This 2^nd-LASS is independent of parameter variations, is linear in

c^{(2)} (i; n; t) ≜ {[c_{1}^{(2)} (i; n; t), c_{2}^{(2)} (i; n; t)]}^{†}

, and can be solved sequentially, by first solving Eq. (83) subject to the initial condition given in Eq. (84) to determine the function

c_{2}^{(2)} (i; n; t)

, and subsequently using the function

c_{2}^{(2)} (i; n; t)

in Eq. (82) to solve it subject to the “final-time” condition given in Eq. (84) to obtain the function

c_{1}^{(2)} (i; n; t)

. The 2^nd-LASS is to be solved using the nominal values of all functions and parameters/weights, but the superscript “zero” has been omitted for notational simplicity.

5.: Using the results obtained in Eqs. (80), (82)‒(84) in Eq. (81) yields the following alternative expression for the “indirect-effect” term $δ {(\partial r_{n} / \partial w_{i})}_{i n d}$ , which no longer involves the 2^nd-level variational sensitivity function $v^{(2, n)} (t)$ but involves the 2^nd-level adjoint sensitivity function $c^{(2)} (i; n; t) ≜ {[c_{1}^{(2)} (i; n; t), c_{2}^{(2)} (i; n; t)]}^{†}$ :

$\begin{array}{l} δ {(\frac{\partial r_{n}}{\partial w_{i}})}_{i n d} = {[c_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x} δ x + {[c_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w} δ w \\ + \int_{t_{0}}^{t_{f}} {[c_{1}^{(2)} (i; n; t)]}^{†} Q^{(1)} δ F d t - \int_{t_{0}}^{t_{f}} {[c_{2}^{(2)} (i; n; t)]}^{†} p^{(2, n)} d t . \end{array}$

(85)
6.: Adding the expression obtained in Eq. (85) to the expression for the direct-effect term $δ {(\partial r_{n} / \partial w_{i})}_{d i r}$ provided in Eq. (78) yields the following expression for the total first-order G-differential $δ (\partial r_{n} / \partial w_{i})$ , for each $n = 1, ..., T R$ :

$\begin{array}{l} δ (\frac{\partial r_{n}}{\partial w_{i}}) = {[a^{(1)} (n; t_{0})]}^{†} [\frac{\partial^{2} h_{e} (x, w)}{\partial x \partial w_{i}} δ x + \frac{\partial^{2} h_{e} (x, w)}{\partial w \partial w_{i}} δ w] + {[c_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x} δ x \\ + {[c_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w} δ w + \int_{t_{0}}^{t_{f}} {[c_{1}^{(2)} (i; n; t)]}^{†} Q^{(1)} δ F d t - \int_{t_{0}}^{t_{f}} {[c_{2}^{(2)} (i; n; t)]}^{†} p^{(2, n)} d t \\ ≜ \frac{\partial^{2} r_{n}}{\partial F \partial F_{i}} δ F + \frac{\partial^{2} r_{n}}{\partial x \partial F_{i}} δ x + \frac{\partial^{2} r_{n}}{\partial w \partial F_{i}} δ w + \frac{\partial^{2} r_{n}}{\partial φ \partial F_{i}} δ φ \\ = \sum_{j = 1}^{T F} \frac{\partial^{2} r_{n}}{\partial F_{j} \partial F_{i}} δ F_{j} + \sum_{j = 1}^{T I} \frac{\partial^{2} r_{n}}{\partial x_{j} \partial F_{i}} δ x_{j} + \sum_{j = 1}^{T E W} \frac{\partial^{2} r_{n}}{\partial w_{j} \partial F_{i}} δ w_{j} + \sum_{j = 1}^{T D} \frac{\partial^{2} r_{n}}{\partial φ_{j} \partial F_{i}} δ φ_{j}; i = 1, ..., T F . \end{array}$

(86)

The expression shown in Eq. (86) is to be evaluated at the nominal values of all functions and parameters/weights but the superscript “zero” (which has been used to indicate this fact) has been omitted for notational simplicity.

Identifying in Eq. (86) the expressions that multiply the individual variations

δ F

δ x

δ w

and

δ φ

yields the following expressions for the second-order sensitivities that stem from the first-order sensitivity

\partial r_{n} / \partial F_{i}

for

n = 1, ..., T R

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial F_{j} \partial w_{i}} = \sum_{k = 1}^{T H} \int_{t_{0}}^{t_{f}} c_{1, k}^{(2)} (i; n; t) \frac{\partial f_{k} (h; F)}{\partial F_{j}} d t - \sum_{m = 1}^{T H} \int_{t_{0}}^{t_{f}} c_{2, m}^{(2)} (i; n; t) \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t) \frac{\partial^{2} f_{k} (h; F)}{\partial F_{j} \partial h_{m}} d t \\ ​ ​ ​ i = 1, ..., T E W; j = 1, ..., T F; n = 1, ..., T R; \end{array}

(87)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial x_{j} \partial w_{i}} = {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial^{2} h_{e} (x, w)}{\partial x_{j} \partial w_{i}} + {[c_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x_{j}} \\ = \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t_{0}) \frac{\partial^{2} h_{k}^{e} (x, w)}{\partial x_{j} \partial w_{i}} + \sum_{k = 1}^{T H} c_{1, k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial x_{j}}; \\ i = 1, ...., T E W; j = 1, ...., T I; n = 1, ..., T R; \end{array}

(88)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial w_{j} \partial w_{i}} = {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial^{2} h_{e} (x, w)}{\partial w_{j} \partial w_{i}} + {[c_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w_{j}} \\ = \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t_{0}) \frac{\partial^{2} h_{k}^{e} (x, w)}{\partial w_{j} \partial w_{i}} + \sum_{k = 1}^{T H} c_{1, k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial w_{j}}; i, j = 1, ...., T E W; n = 1, ..., T R; \end{array}

(89)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial φ_{j} \partial w_{i}} = - \int_{t_{0}}^{t_{f}} {[c_{2}^{(2)} (i; n; t)]}^{†} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ_{j} \partial h} δ (t - t_{f}) d t \\ = - \sum_{k = 1}^{T H} c_{1, k}^{(2)} (i; n; t_{f}) \frac{\partial^{2} h_{n}^{d} [h (t_{f}); φ]}{\partial φ_{j} \partial h_{k} (t_{f})}; i = 1, ...., T E W; j = 1, ...., T D; n = 1, ..., T R . \end{array}

(90)

The second-order sensitivities of the responses with respect to the primary parameters/weights

θ ≜ {[θ_{1}, ..., θ_{T W}]}^{†}

are obtained analytically by using the results obtained in Eqs. (87)‒(90) in conjunctions with the “chain-rule”

\partial [] / \partial θ_{m} = \sum_{μ = 1}^{T F} (\partial [] / \partial F_{μ}) (\partial F_{μ} / \partial θ_{m})

to the expression in Eq. (87). In the absence of feature functions, the second-order sensitivities are obtained by setting

F_{i} (θ) \equiv θ_{i}

, for all

i = 1, ..., T W \equiv T F

3.4. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial x_{i}$ , $i = 1, ..., T I$ ; $n = 1, ..., T R$

The second-order sensitivities stemming from the first-order sensitivities

\partial r_{n} / \partial x_{i}

are obtained from the first-order G-differential of Eq. (35), for

i = 1, ..., T I

;

n = 1, ..., T R

, as follows:

\begin{array}{l} δ (\frac{\partial r_{n}}{\partial x_{i}}) ≜ {\{\frac{d}{d ε} [\frac{\partial h_{e} (x^{0} + ε δ x, w^{0} + ε δ w)}{\partial x_{i}} \int_{t_{0}}^{t_{f}} {[a^{(1, 0)} (n; t) + ε δ a^{(1)} (n; t)]}^{†} δ (t - t_{0}) d t]\}}_{ε = 0} \\ ≜ δ {(\frac{\partial r_{n}}{\partial x_{i}})}_{d i r} + δ {(\frac{\partial r_{n}}{\partial x_{i}})}_{i n d}, i = 1, ..., T I; n = 1, ..., T R, \end{array}

(91)

where the direct-effect term

δ {(\partial r_{n} / \partial x_{i})}_{d i r}

comprises the variations

δ x

and

δ w

, defined as follows:

δ {(\frac{\partial r_{n}}{\partial x_{i}})}_{d i r} ≜ a^{(1)} (n; t_{0}) [\frac{\partial^{2} h_{e} (x, w)}{\partial x \partial x_{i}} δ x + \frac{\partial^{2} h_{e} (x, w)}{\partial w \partial x_{i}} δ w];

(92)

and where the indirect-effect term

δ {(\partial r_{n} / \partial x_{i})}_{i n d}

comprises the variation

δ a^{(1)} (n; t)

and is defined as follows:

δ {(\frac{\partial r_{n}}{\partial x_{i}})}_{i n d} ≜ \frac{\partial h_{e} (x, w)}{\partial x_{i}} \int_{t_{0}}^{t_{f}} {[δ a^{(1)} (n; t)]}^{†} δ (t - t_{0}) d t .

(93)

The direct-effect term

δ {(\partial r_{n} / \partial x_{i})}_{d i r}

can be evaluated at this time for all variations

δ x

and

δ w

, but the indirect-effect term

δ {(\partial r_{n} / \partial x_{i})}_{i n d}

can be evaluated only after having determined the variation

δ a^{(1)} (n; t)

, which is the solution of the 2^nd-LVSS defined by Eqs. (48) and (50). The need for solving this 2^nd-LVSS can be avoided if the indirect-effect term defined in Eq. (93) could be expressed in terms of a “right-hand side” that does not involve the function

δ a^{(1)} (n; t)

. This goal can be achieved by applying the same concepts and steps as in Section 3.3, which will not be shown in detail here in order to minimize repetitive derivations.

The final expressions of the second-order sensitivities stemming from

\partial r_{n} / \partial x_{i}

i = 1, ..., T I

;

n = 1, ..., T R

are as follows:

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial F_{j} \partial x_{i}} = \sum_{k = 1}^{T H} \int_{t_{0}}^{t_{f}} d_{1, k}^{(2)} (i; n; t) \frac{\partial f_{k} (h; F)}{\partial F_{j}} d t - \sum_{m = 1}^{T H} \int_{t_{0}}^{t_{f}} d_{2, m}^{(2)} (i; n; t) \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t) \frac{\partial^{2} f_{k} (h; F)}{\partial F_{j} \partial h_{m}} d t \\ ​ ​ ​ i = 1, ..., T I; j = 1, ..., T F; n = 1, ..., T R; \end{array}

(94)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial x_{j} \partial x_{i}} = {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial^{2} h_{e} (x, w)}{\partial x_{j} \partial x_{i}} + {[d_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial x_{j}} \\ = \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t_{0}) \frac{\partial^{2} h_{k}^{e} (x, w)}{\partial x_{j} \partial x_{i}} + \sum_{k = 1}^{T H} d_{1, k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial x_{j}}; \\ i, j = 1, ...., T I; n = 1, ..., T R; \end{array}

(95)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial w_{j} \partial x_{i}} = {[a^{(1)} (n; t_{0})]}^{†} \frac{\partial^{2} h_{e} (x, w)}{\partial w_{j} \partial x_{i}} + {[d_{1}^{(2)} (i; n; t_{0})]}^{†} \frac{\partial h_{e} (x, w)}{\partial w_{j}} \\ = \sum_{k = 1}^{T H} a_{k}^{(1)} (n; t_{0}) \frac{\partial^{2} h_{k}^{e} (x, w)}{\partial w_{j} \partial x_{i}} + \sum_{k = 1}^{T H} d_{1, k}^{(2)} (i; n; t_{0}) \frac{\partial h_{k}^{e} (x, w)}{\partial w_{j}}; \\ i = 1, ..., T I; ​ ​ ​ ​ ​ ​ j = 1, ...., T E W; n = 1, ..., T R; \end{array}

(96)

\begin{array}{l} \frac{\partial^{2} r_{n}}{\partial φ_{j} \partial x_{i}} = - \int_{t_{0}}^{t_{f}} {[d_{2}^{(2)} (i; n; t)]}^{†} \frac{\partial^{2} h_{n}^{d} (h; φ)}{\partial φ_{j} \partial h} δ (t - t_{f}) d t \\ = - \sum_{k = 1}^{T H} d_{1, k}^{(2)} (i; n; t_{f}) \frac{\partial^{2} h_{n}^{d} [h (t_{f}); φ]}{\partial φ_{j} \partial h_{k} (t_{f})}; i = 1, ...., T I; j = 1, ...., T D; n = 1, ..., T R . \end{array}

(97)

The 2^nd-level adjoint sensitivity function

d^{(2)} (i; n; t) ≜ {[d_{1}^{(2)} (i; n; t), d_{2}^{(2)} (i; n; t)]}^{†}

, with

d_{1}^{(2)} (n; t) ≜ {[d_{1, 1}^{(2)} (n; t), \dots, d_{1, T H}^{(2)} (n; t)]}^{†}

and

d_{2}^{(2)} (n; t) ≜ {[d_{2, 1}^{(2)} (n; t), \dots, d_{2, T H}^{(2)} (n; t)]}^{†}

, which appears in Eqs. (94)‒(97) is the solution of the following 2^nd-Level Adjoint Sensitivity System (2^nd-LASS):

- \frac{d}{d t} d_{1}^{(2)} (i; n; t) - J^{†} d_{1}^{(2)} (i; n; t) + {[G^{(n)}]}^{†} d_{2}^{(2)} (i; n; t) = 0;

(98)

- \frac{d}{d t} d_{2}^{(2)} (i; n; t) + J d_{2}^{(2)} (i; n; t) = \frac{\partial h_{e} (x, w)}{\partial x_{i}} δ (t - t_{0}) .

(99)

d_{1}^{(2)} (i; n; t_{f}) = 0; d_{2}^{(2)} (i; n; t_{0}) = 0 .

(100)

3.5. Discussion: Double-Computation of the Mixed Second-Order Sensitivities

It is evident from the derivations presented in Subsections 3.1 through 3.4 that the unmixed second-order sensitivities are computed just once, but the mixed second-order sensitivities are obtained/computed twice, having obtained two distinct expressions involving distinct 2^nd-level adjoint sensitivity functions for each mixed second-order sensitivity. The specific distinct expressions for the respective mixed second-order sensitivities are as follows:

The mixed second-order sensitivities $\partial^{2} r_{n} / \partial x_{j} \partial F_{i}$ are obtained in Eq. (60) in terms of the 2^nd-level adjoint sensitivity functions $a_{1, k}^{(2)} (i; n; t)$ , $k = 1, ...., T H;$ $i = 1, ...., T F;$ $j = 1, ...., T I;$ $n = 1, ..., T R$ . On the other hand, the mixed second-order sensitivities $\partial^{2} r_{n} / \partial F_{j} \partial x_{i}$ are obtained in Eq. (94) in terms of the 2^nd-level adjoint sensitivity functions $d_{1, k}^{(2)} (i; n; t)$ and $d_{2, k}^{(2)} (i; n; t)$ , for $i = 1, ..., T I;$ $j = 1, ..., T F;$ $n = 1, ..., T R$ . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (60), on the one hand, and using Eq. (94), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2^nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities $\partial^{2} r_{n} / \partial w_{j} \partial F_{i}$ are obtained in Eq. (61) in terms of the 2^nd-level adjoint sensitivity functions $a_{1, k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ...., T F;$ $j = 1, ...., T I;$ $n = 1, ..., T R$ . On the other hand, the mixed second-order sensitivities $\partial^{2} r_{n} / \partial F_{j} \partial w_{i}$ are obtained in Eq. (87) in terms of the 2^nd-level adjoint sensitivity functions $c_{1, k}^{(2)} (i; n; t)$ and $c_{2, k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ..., T E W;$ $j = 1, ..., T F;$ $n = 1, ..., T R$ . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (61), on the one hand, and using Eq. (87), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2^nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities $\partial^{2} r_{n} / \partial φ_{j} \partial F_{i}$ are obtained in Eq. (62) in terms of the 2^nd-level adjoint sensitivity functions $a_{1, k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ...., T F;$ $j = 1, ...., T D;$ $n = 1, ..., T R$ . On the other hand, the mixed second-order sensitivities $\partial^{2} r_{n} / \partial F_{j} \partial φ_{i}$ are obtained in Eq. (73) in terms of the 2^nd-level adjoint sensitivity functions $b_{k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ..., T D;$ $j = 1, ..., T F;$ $n = 1, ..., T R$ . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (62), on the one hand, and using Eq. (73), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2^nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities $\partial^{2} r_{n} / \partial x_{j} \partial w_{i}$ are obtained in Eq. (88) in terms of the adjoint sensitivity functions $a_{k}^{(1)} (n; t)$ and $c_{1, k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ...., T E W;$ $j = 1, ...., T I;$ $n = 1, ..., T R$ . On the other hand, the mixed second-order sensitivities $\partial^{2} r_{n} / \partial w_{j} \partial x_{i}$ are obtained in Eq. (96) in terms of the adjoint sensitivity functions $a_{k}^{(1)} (n; t)$ and $d_{1, k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ..., T I;$ $j = 1, ..., T E W;$ $n = 1, ..., T R$ . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (74), on the one hand, and using Eq. (97), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the adjoint functions involved in the respective computations.
The mixed second-order sensitivities $\partial^{2} r_{n} / \partial x_{j} \partial φ_{i}$ are obtained in Eq. (74) in terms of the 2^nd-level adjoint sensitivity functions $b_{k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ...., T D;$ $j = 1, ...., T I;$ $n = 1, ..., T R$ . On the other hand, the mixed second-order sensitivities $\partial^{2} r_{n} / \partial φ_{j} \partial x_{i}$ are obtained in Eq. (97) in terms of the 2^nd-level adjoint sensitivity functions $d_{1, k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ..., T I;$ $j = 1, ..., T D;$ $n = 1, ..., T R$ . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (74), on the one hand, and using Eq. (97), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2^nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities $\partial^{2} r_{n} / \partial w_{j} \partial φ_{i}$ are obtained in Eq. (75) in terms of the 2^nd-level adjoint sensitivity functions $b_{k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ...., T D;$ $j = 1, ...., T E W;$ $n = 1, ..., T R$ . On the other hand, the mixed second-order sensitivities $\partial^{2} r_{n} / \partial φ_{j} \partial w_{i}$ are obtained in Eq. (90) in terms of the 2^nd-level adjoint sensitivity functions $c_{1, k}^{(2)} (i; n; t)$ , for $k = 1, ...., T H;$ $i = 1, ..., T E W;$ $j = 1, ..., T D;$ $n = 1, ..., T R$ . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (75), on the one hand, and using Eq. (90), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2^nd-level adjoint functions involved in the respective computations.

4. Discussion and Conclusions

As has been discussed in this work, after the NODE-net is optimized to reproduce the underlying physical system as closely as possible, the subsequent NODE-responses of interest are various functionals of the NODE’s “decoder” output rather than some “loss function.” The physical system modeled by the NODE-net comprises parameters that stem from measurements and/or computations. Such parameters are not perfectly well known but are subject to uncertainties that stem from the respective experiments and/or computations. These uncertainties will induce uncertainties in the NODE decoder’s output/response, which can only be quantified if the sensitivities of the NODE decoder’s response with respect to the optimized NODE weights/parameters are available.

It is well known that equations modeling real-life phenomena comprise not only scalar-valued model parameters but also functions of such scalar model parameters. For example, conservation equations modeling thermal-hydraulics phenomena depend on correlations, Reynolds, Nusselt numbers, etc. Similarly, neutron transport and/or diffusion equations in nuclear reactor engineering (e.g., reactor physics and shielding) involve group-averaged macroscopic cross section, which are scalar-valued functions of various sums of scalar-valued isotopic number densities and microscopic cross sections. Such scalar-valued functions are conveniently called “features of primary model parameters.” This work has introduced the mathematical framework of the “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-FASAM-NODE)”, which derives and computes most efficiently the exact expressions of all of the first-order sensitivities of NODE-decoder responses with respect to the features of parameters (weights, initial conditions, etc.) that characterize the NODE’s decoder, hidden layers, and encoder. It has been shown that a single large-scale computation to solve the 1^st-Order Adjoint Sensitivity System(1^st-LASS) suffices to obtain the first-order sensitivities of a decoder-response with respect to all of the feature functions of parameters. Subsequently, the sensitivities of the responses with respect to the primary model parameters are determined, analytically and trivially, by applying the “chain-rule” to the expressions of the sensitivities with respect to the model’s “features/functions of parameters.”

This work has also presented the “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2^nd-FASAM-NODE),” which builds upon the 1^st-FASAM-NODE methodology to derive and compute most efficiently the exact expressions of all of the second-order sensitivities of NODE-decoder responses with respect to the features of parameters (weights, initial conditions, etc.) that characterize the NODE’s decoder, hidden layers, and encoder. It has been shown that, for each decoder response, the computation of all of the second-order sensitivities requires as many large-scale computations ( to solve the 2^nd-Level Adjoint Sensitivity System) as there are first-order sensitivities of the decoder-response with respect to the feature functions, initial conditions, and decoder-weights. The 2^nd-FASAM-NODE methodology computes the mixed second-order sensitivities twice, involving distinct 2^nd-level adjoint sensitivity functions in distinct expressions.

The concepts underlying the 1^st-FASAM-NODE, and the 2^nd-FASAM-NODE methodologies will be illustrated in the accompanying “Part II” [18] by considering the energy and heat transfer processes described by the well-known Nordheim-Fuchs phenomenological model [19,20] of reactor safety.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflicts of interest.

References

Lu, Y.; Zhong, A.; Li, Q.; Dong, B. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR; pp. 3276–3285. [Google Scholar]
Ruthotto, L.; Haber, E. Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 2018, 62, 352–364. [Google Scholar] [CrossRef]
Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2018; Volume 31, pp. 6571–6583. [Google Scholar] [CrossRef]
Dupont, E.; Doucet, A.; The, Y.W. Augmented neural odes. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 14–15. [Google Scholar]
Kidger, P. On Neural Differential Equations. arXiv 2022, arXiv:2202.02435. [Google Scholar]
Zhong, Y.D.; Dey, B.; Chakraborty, A. Symplectic ode-net: Learning Hamiltonian dynamics with control. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
Grathwohl, W.; Chen, R.T.Q.; Bettencourt, J.; Sutskever, I.; Duvenaud, D. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Kidger, P.; Morrill, J.; Foster, J.; Lyons, T. Neural controlled differential equations for irregular time series. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 6696–6707. [Google Scholar]
Morrill, J.; Salvi, C.; Kidger, P.; Foster, J. Neural rough differential equations for long time series. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR; pp. 7829–7838. [Google Scholar]
Tieleman, T.; Hinton, G. Lecture 6.5—RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Pontryagin, L.S. Mathematical Theory of Optimal Processes; CRC Press: Boca Raton, FL, USA, 1987. [Google Scholar]
LeCun, Y. A theoretical framework for back-propagation. In Proceedings of the Connectionist Models Summer School; Touresky, D., Hinton, G., Sejnowski, T., Eds. Morgan Kaufmann Publishers, Inc.: San Mateo, CA, USA, 1988. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Norcliffe, A.; Deisenroth, M.P. Faster training of neural ODEs using Gauss–Legendre quadrature. arXiv, arXiv:2308.10644.
Cacuci, D.G. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model. J. Nucl. Eng. 2024, 5, 347–372. [Google Scholar] [CrossRef]
Cacuci, D.G. Introducing the n^th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (n^th-FASAM-N): I. Mathematical Framework. Am. J. Comput. Math. 2024, 14, 11–42. [Google Scholar] [CrossRef]
Cacuci, D.G. Introducing the Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations. II: Illustrative Application to Heat and Energy Transfer in the Nordheim-Fuchs Phenomenological Model for Reactor Safety, Processes, submitted, October 2024.
Lamarsh, J.R. Introduction to Nuclear Reactor Theory; Adison-Wesley Publishing, Co. : Reading, MA, USA, 1966; pp. 491–492.
Hetrick, D.L. Dynamics of Nuclear Reactors; American Nuclear Society, Inc.: La Grange Park, IL, USA, 1993; pp. 164–174. [Google Scholar]
Cacuci, D.G. (1981). Sensitivity theory for nonlinear systems: I. Nonlinear functional analysis approach. J. Math. Phys., 1981, 22, 2794–2812. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Alerts

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Introducing the Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations. I: Mathematical Framework

Abstract

Keywords:

Subject:

1. Introduction

2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-FASAM-NODE)

3. Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2nd-FASAM-NODE)

3.1. Second-Order Sensitivities Stemming from ∂ r n / ∂ F i , i = 1 , ... , T F ; n = 1 , ... , T R

3.2. Second-Order Sensitivities Stemming from ∂ r n / ∂ φ i , i = 1 , ... , T D ; n = 1 , ... , T R

3.3. Second-Order Sensitivities Stemming from ∂ r n / ∂ w i , i = 1 , ... , T E W ; n = 1 , ... , T R

3.4. Second-Order Sensitivities Stemming from ∂ r n / ∂ x i , i = 1 , ... , T I ; n = 1 , ... , T R

3.5. Discussion: Double-Computation of the Mixed Second-Order Sensitivities

4. Discussion and Conclusions

Funding

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe

2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-FASAM-NODE)

3. Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2^nd-FASAM-NODE)

3.1. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial F_{i}$ , $i = 1, ..., T F$ ; $n = 1, ..., T R$

3.2. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial φ_{i}$ , $i = 1, ..., T D$ ; $n = 1, ..., T R$

3.3. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial w_{i}$ , $i = 1, ..., T E W$ ; $n = 1, ..., T R$

3.4. Second-Order Sensitivities Stemming from $\partial r_{n} / \partial x_{i}$ , $i = 1, ..., T I$ ; $n = 1, ..., T R$