First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim-Fuchs Reactor Safety Model

Dan Gabriel Cacuci

doi:10.20944/preprints202407.2613.v1

Preprint

Article

First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim-Fuchs Reactor Safety Model

This version is not peer-reviewed.

Dan Gabriel Cacuci^*

This version is not peer-reviewed.

Downloads

Views

Comments

Submitted:

31 July 2024

Posted:

01 August 2024

You are already at the latest version

A peer-reviewed article of this preprint also exists.

Abstract

This work introduces the mathematical framework of the novel “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-CASAM-NODE)” which yields exact expressions for the first-order sensitivities of NODE decoder-responses to the NODE parameters, including encoder initial conditions, while enabling the most efficient computation of these sensitivities. The application of the 1st-CASAM-NODE is illustrated by using the Nordheim-Fuchs reactor dynamics/safety phenomenological model, which is representative of physical systems that would be modeled by NODE while admitting exact analytical solutions for all quantities of interest (hidden states, decoder outputs, sensitivities with respect to all parameters and initial conditions, etc.). This work also lays the foundation for the ongoing work on conceiving the “Second-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2nd-CASAM-NODE)” which aims at yielding exact expressions for the second-order sensitivities of NODE decoder-responses to the NODE parameters and initial conditions while enabling the most efficient computation of these sensitivities.

Keywords:

neural ordinary differential equations (NODE)

;

comprehensive adjoint sensitivity analysis methodology for NODE (1st-CASAM-NODE)

;

Nordheim-Fuchs reactor safety model

;

sensitivity analysis for model features ((1st-FASAM-N)

;

exact sensitivities

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Concepts of dynamical systems theory had been frequently used to improve neural network performance [1,2,3] but Neural Ordinary Differential Equations (NODE) appear to have been formally introduced by Chen et al. [4]. NODE provide an explicit connection between deep feed-forward neural networks and dynamical systems and are considered to provide a bridge between modern deep learning and classical mathematical/numerical modelling. NODE provide a flexible trade-off between efficiency, memory costs and accuracy. The approximation capabilities [5,6] of NODE are particularly useful for time-series modelling [4,7,8], generative models for continuous normalizing flows [4,9] and modeling/controlling physical environments see, e.g., [10].

Neural ODEs are trained by minimizing a least-squares quadratic scalar-valued “loss function” by computing its gradients with respect to the weights to be optimized using a first-order optimizer such as “stochastic gradient descent” [11,12]). Since ODE solvers (e.g., Runge–Kutta solvers) perform differentiable algebraic operations, the gradients of the loss function can be calculated by the so-called “direct method” which directly backpropagates through the operations performed by the ODE solver. However, when the dynamics are complex, the “direct method” can lead to an arbitrarily large number of function evaluations for adaptive solvers while storing all of the intermediate activations during the “solving” process, so the “direct method” becomes prohibitively memory intensive. A NODE-training method which is less memory intensive is the so-called “adjoint method” [13,14,15], which solves an ODE (related to the original NODE) backwards in time. The direct method is faster but is more memory intensive than the adjoint method. The one-dimensional definite integrals, which appear when computing gradients via the “adjoint method” are traditionally evaluated by solving them as differential equations, which considerably slows down the training process. Evaluating these one-dimensional definite integrals by using Gauss–Legendre quadrature (rather than solving them as ODEs) has been shown [16] to be faster than ODE-based methods while retaining memory efficiency, thus speeding up the training of NODE.

The gradients of the loss function are often called “sensitivities” in the literature on neural nets and aspects of the optimization/training procedure are occasionally called “sensitivity analysis.” But the “loss function” is of interest only for the “training” phase of the NODE and the “sensitivities of the loss function” are driven towards the ideal zero-values by the minimization process while optimizing the NODE weights/parameters. Furthermore, after the NODE is optimized to reproduce the underlying physical system as closely as possible, the responses of interest for the NODE-modeled system is no longer a “loss function” but are various functions of the NODE’s “decoder”-output. Since the physical system being modeled by the NODE comprises itself parameters that stem from measurements or computations, they are not perfectly well-known, but are afflicted by uncertainties that stem from the respective experiments and/or computations. Hence, it is important to quantify the uncertainties induced in the NODE decoder-output by the uncertainties that afflict the parameters/weights underlying the physical system modeled by the NODE. The quantification of the uncertainties in the NODE-decoder and derived results (i.e., “NODE-responses”) of interest require the computation of the sensitivities of the NODE-decoder with respect to the optimized NODE-weights/parameters. However, a “NODE sensitivity analysis” methodology for computing efficiently exact expressions of decoder-sensitivities with respect to the optimized parameters/weights, including with respect to the initial conditions/encoder, does not seem to be available in the literature.

The scope of this work is to present a novel methodology for computing all of the first-order sensitivities, exactly and exhaustively, of the responses of the post-training optimized NODE-decoder with respect to the optimized/trained weights involved in the NODE’s decoder, hidden layers, and encoder. The general mathematical representation of the NODE-network considered in this work is presented in Section 2. As a specific illustrative paradigm application, Section 3 presents the NODE conceptual representation of the Nordheim-Fuchs phenomenological reactor dynamics/safety model [17,18]. This paradigm illustrative model has been chosen because it is representative of typical NODE-applications while admitting closed-form analytical solutions for the quantities of interest, including the functions describing the hidden layers, encoder, decoder, and sensitivities of decoder responses. Section 4 presents the Mathematical Framework of the novel “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-CASAM-NODE).” Section 5 illustrates the application of the 1^st-CASAM-NODE methodology to compute all of the first-order sensitivities of Nordheim-Fuchs model responses with respect to the underlying parameters. Specifically, Subsections 5.1 through 5.4, respectively, illustrate the application of the 1^st-CASAM-NODE methodology for computing the first-order sensitivities with respect to the underlying model parameters and initial conditions of the following responses: (i) the reactor’s flux; (ii) the reactor’s energy release; (iii) the reactor’s temperature; and (iv) the reactor’s thermal conductivity. Using the “energy-released” response as a paradigm, Subsection 5.5 illustrates an alternative path for computing first-order sensitivities by applying the “First-Order Feature Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (1^st-FASAM-N)” [19], which is the most efficient procedure for computing first-order sensitivities, but which may require the construction of a dedicated neural net for this purpose.

2. Neural Ordinary Differential Equations (NODE): Basic Properties and Uses

A general mathematical representation of a NODE-network is provided by the following system of so-called “augmented” equations:

\frac{d h (t)}{d t} = f [h (t); θ; t], t > 0

(1)

h (t_{0}) = h_{e} (x, w), a t t = t_{0}

(2)

r (t_{f}) = h_{d} [h (t_{f}); φ], a t t = t_{f}

(3)

where:

(i): The quantity $t$ is a time-like independent variable which parameterizes the dynamics of the hidden/latent neuron units; the initial value is denoted as $t_{0}$ (which can be considered to be an initial measurement time) while the stopping value is denoted as $t_{f}$ (which can be considered to be the next measurement time).
(ii): The $T H$ -dimensional vector-valued function $h (t) ≜ {[h_{1} (t), ..., h_{T H} (t)]}^{†}$ represents the hidden/latent neural networks. In this work, all vectors are considered to be column vectors and the dagger “ $†$ ” symbol will be used to denote “transposition.” The symbol “ $≜$ ” signifies “is defined as” or, equivalently, “is by definition equal to.”
(iii): The $T H$ -dimensional vector-valued nonlinear function $f [h (t); θ; t] ≜ {[f_{1} (h; θ; t), ..., f_{T H} (h; θ; t)]}^{†}$ models the dynamics of the latent neurons with learnable scalar adjustable weights represented by the components of the vector $θ ≜ {[θ_{1}, ..., θ_{T W}]}^{†}$ , where $T W$ denotes the total number of adjustable weights in all of the latent neural nets.
(iv): The $T H$ -dimensional vector-valued function $h_{e} (x, w) ≜ {\{h_{1}^{e} (x, w), ..., h_{T H}^{e} (x, w)\}}^{†}$ represents the “encoder” which is characterized by “inputs” $x ≜ {[x_{1}, ..., x_{T I}]}^{†}$ and “learnable” scalar adjustable weights $w ≜ {[w_{1}, ..., w_{T E W}]}^{†}$ , where $T I$ denotes the total number of “inputs” and $T E W$ denotes the total number of “learnable encoder weights” that define the “encoder.”
(v): The $T R$ -dimensional vector-valued function $r (t_{f}) ≜ {\{r_{1} [h (t_{f}); φ], ..., r_{T R} [h (t_{f}); φ]\}}^{†} = h_{d} [h (t_{f}); φ]$ represents the vector of “system responses.” The vector-valued function $h_{d} [h (t_{f}); φ] ≜ {\{h_{1}^{d} [h (t_{f}); φ], ..., h_{T R}^{d} [h (t_{f}); φ]\}}^{†}$ represents the “decoder” with learnable scalar adjustable weights, which are represented by the components of the vector $φ ≜ {[φ_{1}, ..., φ_{T D}]}^{†}$ , where $T D$ denotes the total number of adjustable weights that characterize the “decoder.” Each component $r_{n} [h (t_{f}); φ]$ can be epresented in integral form as follows:

r_{n} (h; φ) = \int_{t_{0}}^{t_{f}} h_{n}^{d} [h (t); φ] δ (t - t_{f}) d t; n = 1, ..., T R .

The weights of the NODE are adjusted/calibrated by “training” the NODE, using gradients of a scalar loss functional, denoted as

L [h (t); θ; t]

which is designed to represent the deviations/discrepancies between the responses/outputs of the NODE and the “true” values obtained from measurements (or by other means, independently of the NODE). There are several methods for accomplishing this “training,” all of which require that the functions underlying the NODE, i.e.,

h (t)

f [h (t); θ; t]

h_{e} (x, w)

and

h_{d} [h (t); φ]

be differentiable with respect to their arguments. For complex systems, involving many parameters, the so-called “adjoint method” [13,14,15] offers an optimal compromise between memory requirements and computational intensity. This method computes the required gradients of the loss function by evaluating the following integral:

\frac{\partial L}{\partial θ} = \int_{t_{0}}^{t_{f}} {[a (t)]}^{†} \frac{\partial f [h (t); θ; t]}{\partial θ} d t ​ ​

(5)

where the so-called “adjoint function”

a (t) ≜ {[a_{1} (t), ..., a_{T H} (t)]}^{†}

satisfies the following “adjoint equation” computed backwards in time:

\frac{d a (t)}{d t} = - {[a (t)]}^{†} \frac{\partial f [h (t); θ; t]}{\partial h}, t > 0

(6)

a (t) = \frac{\partial L [h (t); θ; t]}{\partial h}, at t = t_{f}

(7)

After the “training” of the NODE has been accomplished, the various “weights” will have been assigned “optimal” values which will have minimized the chosen loss functional

L [h (t); θ; t]

. These “optimal” values will be denoted using a superscript “zero” as follows:

θ^{0} ≜ {[θ_{1}^{0}, ..., θ_{T W}^{0}]}^{†}

and

w^{0} ≜ {[w_{1}^{0}, ..., w_{T E W}^{0}]}^{†}

. These optimal values are used to compute the optimal values for the system responses, which will be denoted as

r_{n}^{0} [h (t_{f}); φ]

. However, since the physical parameters and the initial conditions underlying the actual physical system (which is represented by the optimized NODE) are not known exactly (because they are actually subject to uncertainties), it follows that the optimal values obtained for the weights are actually just nominal values that are used to compute the nominal/optimal response-values

r_{n}^{0} [h (t_{f}); φ]

. The uncertainties in the various weights and initial conditions will induce uncertainties in the system responses, which can be computed deterministically by using the well-known “propagation of errors” methodology, originally proposed by Tuckey [20] and subsequently extended to sixth-order by Cacuci [21].

3. Illustrative Paradigm Application: NODE Conceptual Modeling of the Nordheim-Fuchs Phenomenological Reactor Dynamics/Safety Model

The Nordheim-Fuchs phenomenological model [17,18] describes a short-time self-limiting power transient in a nuclear reactor system having a negative temperature coefficient in which a large amount of reactivity is suddenly inserted, either intentionally or by accident. The response of such a reactor system can be estimated by considering that the reactivity insertion is sufficiently large and the time-span of the transient phenomena under consideration is of the order of the life-time of prompt-neutrons. For such short times, the effects of delayed neutrons and the local spatial variations of the neutron distribution in the reactor can be neglected, and the heat generated during the transient remains within the reactor. Using the notation of Lamarsh [17], the Nordheim-Fuchs paradigm model describing such a self-limiting power transient comprises the following balance equations:

The time-dependent neutron balance (point kinetics) equation for the neutron flux $φ (t)$ :

\frac{d φ (t)}{d t} = \frac{k (t) - 1}{l_{p}} φ (t), t > 0

(8)

φ (0) = φ_{0}, t = 0

(9)

where

l_{p}

denotes the prompt-neutron lifetime,

k (t)

denotes the reactor’s multiplication factor, and

φ_{0}

denotes the initial (i.e., extant flux) prior to initiating the transient at time

t = 0

2.: The energy production equation:

E (t) = γ Σ_{f} \int_{0}^{t} φ (x) d x

(10)

where

γ

denotes the recoverable energy per fission;

Σ_{f} ≜ σ_{f} N_{f}

denotes the reactor’s effective macroscopic fission cross section, where

σ_{f}

denotes the reactor’s equivalent microscopic fission cross section while

N_{f}

denotes the reactor’s equivalent atomic number density.

3.: The energy conservation equation:

c_{p} [T (t) - T_{0}] = E (t)

(11)

where

E (t)

denotes the total energy released (per cm³) at time

t

in the reactor since the onset of reactivity change;

c_{p}

denotes the specific heat (per cm³) of the reactor.

4.: The reactivity-temperature feedback equation: $k (t) = k_{0} - α_{T} k_{0} [T (t) - T_{0}]$ , where $k_{0} ≜ k (0) \geq 1$ denotes the changed multiplication factor following the reactivity insertion at $t = 0$ , $α_{T}$ denotes the magnitude of the negative temperature coefficient, $T (t)$ denotes the reactor’s temperature, and $T_{0}$ denotes the reactor’s initial temperature at time $t = 0$ . For illustrating the application of the 1st-FASAM methodology, it suffices to consider the special case of a “prompt critical transient”, when the reactor becomes prompt critical after the reactivity insertion, i.e., when $k_{0} = 1$ , so that the reactivity-temperature feedback equation takes on the following particular form:

k (t) = 1 - α_{T} [T (t) - T_{0}]

(12)

Equations (8)‒(12) can be transformed into the following system of nonlinear differential equations:

\frac{d φ (t)}{d t} = - \frac{α_{T}}{l_{p} c_{p}} E (t) φ (t), t > 0 . φ (0) = φ_{0}, t = 0

(13)

\frac{d E (t)}{d t} = γ σ_{f} N_{f} φ (t), E (0) = 0

(14)

\frac{d T (t)}{d t} = \frac{γ σ_{f} N_{f}}{c_{p}} φ (t); T (0) = T_{0}

(15)

The Nordheim-Fuchs model described by Eqs. (13)‒(15) can be solved analytically to obtain exact closed-form expression for the state functions

φ (t)

E (t)

, and

T (t)

, as follows:

: Eliminating the function $φ (t)$ from Eqs. (13) and (14) yields a nonlinear differential equation which can be integrated directly to obtain the following relation:

φ (t) = - \frac{α_{T}}{2 l_{p} c_{p} γ σ_{f} N_{f}} E^{2} (t) + φ_{0}

(16)

(ii): Using Eq. (16) in Eq. (14) yields the following nonlinear equation for the released energy $E (t)$ :

\frac{d E (t)}{d t} = - \frac{α_{T}}{2 l_{p} c_{p}} E^{2} (t) + φ_{0} γ σ_{f} N_{f}, E (0) = 0

(17)

The closed-form solution of Eq. (17) has the following form:

E (t) = K_{1} (α) \tanh [t K_{2} (α)]

(18)

where:

K_{1} (α) ≜ {[\frac{2 φ_{0} γ σ_{f} N_{f} l_{p} c_{p}}{α_{T}}]}^{1 / 2}; K_{2} (α) ≜ {[\frac{α_{T} φ_{0} γ σ_{f} N_{f}}{2 l_{p} c_{p}}]}^{1 / 2} .

(19)

(iii): Replacing Eq. (18) into Eq. (16) yields the following closed-form expression for $φ (t)$ :

φ (t) = φ_{0} \{1 - \tanh^{2} [t K_{2} (α)]\} = \frac{φ_{0}}{\cosh^{2} [t K_{2} (α)]}

(20)

(iv): Replacing Eq. (18) into Eq. (11) yields the following closed-form expression for $T (t)$ :

T (t) = T_{0} + \frac{K_{1} (α)}{c_{p}} \tanh [t K_{2} (α)]

(21)

The typical results of interest (called “model response”) for the Nordheim-Fuchs model are as follows:

(i): The neutron flux $φ (τ)$ in the reactor at a “final time” instance denoted as $t = τ$ , after the initiation at $t = 0$ of the prompt-critical power transient, which can be defined mathematically as follows:

φ (τ) = \int_{0}^{τ} φ (t) δ (t - τ) d t

(22)

(ii): The total energy per cm³, $E (τ)$ , released at a user-chosen “final time” instance denoted as $t = τ$ , after the initiation at $t = 0$ of the prompt-critical power transient, which can be defined mathematically as follows:

E (τ) = \int_{0}^{τ} E (t) δ (t - τ) d t

(23)

where

δ (t - τ)

denotes the Dirac-delta functional.

(iii): The reactor’s temperature $T (τ)$ at a “final time” instance denoted as $t = τ$ after the initiation at $t = 0$ of the prompt-critical power transient, which can be defined mathematically as follows:

T (τ) = \int_{0}^{τ} T (t) δ (t - τ) d t

(24)

Comparing the structure of the Nordheim-Fuchs model, cf. Eqs. (13)‒(15), to the generic structure of a NODE, cf. Eqs. (1) and (2), indicates the following correspondences:

h (t) ≜ {[h_{1} (t), ..., h_{T H} (t)]}^{†} ≜ {[φ (t), E (t), T (t)]}^{†}; ​ ​ ​ ​ ​ T H = 3;

(25)

θ ≜ {[θ_{1}, ..., θ_{T W}]}^{†} ≜ {(α_{T}, l_{p}, c_{p}, γ, σ_{f}, N_{f})}^{†}; ​ ​ x ≜ {[x_{1}, x_{2}]}^{†} ≜ {(φ_{0}, T_{0})}^{†}; T W = 6, T I = 2

(26)

f_{1} (h; θ; t) ≜ - \frac{α_{T}}{l_{p} c_{p}} E (t) φ (t) ≜ - \frac{θ_{1}}{θ_{2} θ_{3}} h_{1} (t) h_{2} (t)

(27)

f_{2} (h; θ; t) ≜ γ σ_{f} N_{f} φ (t) ≜ θ_{4} θ_{5} θ_{5} h_{1} (t)

(28)

f_{3} (h; θ; t) ≜ \frac{γ σ_{f} N_{f}}{c_{p}} φ (t) ≜ \frac{θ_{4} θ_{5} θ_{6}}{θ_{3}} h_{1} (t)

(29)

The actual values of the components of the vectors

θ

and

x

are unknown even after having trained the NODE, since the actual values of the parameters underlying the Nordheim-Fuchs model are experimentally-measured and are thus subject to uncertainties. However, the nominal values of these parameters are considered to be known, and are considered to be exactly reproducible by the “trained” NODE; these nominal values will be denoted using a superscript “zero,” as follows:

θ^{0} ≜ {[θ_{1}^{0}, ..., θ_{6}^{0}]}^{†} ≜ {(α_{T}^{0}, l_{p}^{0}, c_{p}^{0}, γ^{0}, σ_{f}^{0}, N_{f}^{0})}^{†}; x^{0} ≜ {[x_{1}^{0}, x_{2}^{0}, x_{3}^{0}]}^{†} ≜ {[φ_{0}^{0}, 0; T_{0}^{0}]}^{†}

(30)

Consequently, the exact values of the functions

h (t) ≜ {[h_{1} (t), h_{2} (t), h_{3} (t)]}^{†} ≜ {[φ (t), E (t), T (t)]}^{†}

are unknown but their nominal values

h^{0} (t) ≜ {[h_{1}^{0} (t), h_{2}^{0} (t), h_{3}^{0} (t)]}^{†} ≜ {[φ^{0} (t), E^{0} (t), T^{0} (t)]}^{†}

are known after having solved Eqs. (13)‒(15) at the nominal values

(θ^{0}, x^{0})

The NODE-representations, cf. Eq. (4), of the responses considered in Eqs. (23)‒(24)have the following expressions, respectively:

r_{1} (h) = \int_{t_{0} = 0}^{t_{f}} h_{1} (t) δ (t - t_{f}) d t = φ (t_{f})

(31)

r_{2} (h) = \int_{t_{0} = 0}^{t_{f}} h_{2} (t) δ (t - t_{f}) d t = E (t_{f})

(32)

r_{3} (h) = \int_{t_{0} = 0}^{t_{f}} h_{3} (t) δ (t - t_{f}) d t = T (t_{f})

(33)

To illustrate the efficient computation of responses involving decoders having their own parameters/weights, the thermal conductivity of the conceptual material of the Nordheim-Fuchs reactor model will be considered to be a “decoder” response having the following expression:

\begin{array}{l} r_{4} (h; φ) = \int_{t_{0}}^{t_{f}} h_{4}^{d} [h (t); φ] δ (t - t_{f}) d t; ​ ​ ​ ​ ​ ​ \\ h_{4}^{d} [h (t); φ] ≜ k (T) = φ_{1} + φ_{2} h_{3} (t) + φ_{3} h_{3}^{2} (t) = φ_{1} + φ_{2} T (t) + φ_{3} T^{2} (t) . \end{array}

(34)

4. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-CASAM-NODE): Mathematical Framework

At the optimal/nominal parameter values, the optimal/nominal solution

h^{0} (t)

will satisfy the following forms of Eqs. (1) and (2):

\frac{d h^{0} (t)}{d t} = f [h^{0} (t); θ^{0}; t], t > 0

(35)

h^{0} (t_{0}) = h_{e} (x^{0}, w^{0}), a t t = t_{0}

(36)

, Furthermore, the vector of optimal/nominal response will have components that are obtained by using the nominal values for the respective functions and parameters, i.e.:

r_{n}^{0} (h^{0}; φ^{0}) = \int_{t_{0}}^{t_{f}} h_{n}^{d} [h^{0} (t); φ^{0}] δ (t - t_{f}) d t; n = 1, ..., T R .

(37)

The known nominal values

x^{0}

of the initial conditions will differ from the true but unknown values

x

of the initial conditions by variations denoted as

δ x ≜ x - x^{0}

. Furthermore, the known nominal values

w^{0}

of the weights characterizing the encoder will differ from the true but unknown values

w

of the respective weights by variations denoted as

δ w ≜ w - w^{0}

. Similarly, the nominal values

θ^{0}

and

φ^{0}

, respectively, will differ by variations

δ θ ≜ θ - θ^{0}

and

δ φ ≜ φ - φ^{0}

, respectively, from the corresponding true but unknown values

θ

and

φ

. Since the forward state functions

h (t)

are related to the weights and initial conditions through Eqs. (1) and (2), it follows that the variations in these weights and initial conditions will induce corresponding variations

v^{(1)} (t) ≜ {[δ h_{1} (t), \dots, δ h_{T H} (t)]}^{†}

around the nominal solution

h^{0} (t)

. In turn, the variations

δ φ

and

v^{(1)} (t)

will induce variations

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

in the system’s response.

The 1^st-CASAM-NODE methodology for computing the first-order sensitivities of the response with respect to the model’s weights and initial conditions will be established by following the same principles as those underlying the 1^st-CASAM-N methodology [22], which commence by noting that Cacuci [23] has shown that the most general definition of the sensitivity of an operator-valued model response

R (e)

with respect to variations

δ e

in the model parameters and state functions in a neighborhood around the nominal functions and parameter values

e^{0}

, is given by the 1^st-order Gateaux- (G-) variation, which will be denoted as

δ R (e^{0}; δ e)

and is defined as follows:

δ R (e^{0}; δ e) ≜ {\{\frac{d}{d ε} [R (e^{0} + ε δ e)]\}}_{ε = 0} ≜ \lim_{ε \to 0} \frac{R (e^{0} + ε δ e) - R (e^{0})}{ε}

(38)

for a scalar

ε

and for all (i.e., arbitrary) vectors

δ e

in a neighborhood

(e^{0} + ε δ e)

around

e^{0}

. The G-variation

δ R (e^{0}; δ e)

is an operator defined on the same domain as

R (e)

and has the same range as

R (e)

. The G-variation

δ R (e^{0}; δ e)

satisfies the relation:

R (e^{0} + ε δ e) - R (e^{0}) = δ R (e^{0}; δ e) + Δ (δ e),

with

\lim_{ε \to 0} [Δ (ε δ e)] / ε = 0

. When the G-variation

δ R (e^{0}; δ e)

is linear in the variation

δ e

, it can be written in the form

δ R (e^{0}; δ e) = {\{\partial R / \partial e\}}_{e^{0}} δ e

, where

{\{\partial R / \partial e\}}_{e^{0}}

denotes the first-order G-derivative of

R (e)

with respect to

e

evaluated at

e^{0}

Applying the definition provided in Eq. (38) to Eq. (4) yields the following expression for the first-order G-variation

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

of the response

r_{n} (h; φ)

\begin{array}{l} δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ) = {\{\frac{d}{d ε} \int_{t_{0}}^{t_{f}} h_{n}^{d} [h^{0} (t) + ε v^{(1)} (t); φ^{0} + ε δ φ] δ (t - t_{f}) d t;\}}_{ε = 0} \\ = {\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r} + {\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d}; n = 1, ..., T R . \end{array}

(39)

where

v^{(1)} ≜ {[v_{1}^{(1)} (t), ..., v_{T H}^{(1)} (t)]}^{†}

and:

{\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r} ≜ {\int_{t_{0}}^{t_{f}} δ (t - t_{f}) \{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial φ}\}}_{(h^{0}; φ^{0})} δ φ d t

(40)

{\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} ≜ {\int_{t_{0}}^{t_{f}} δ (t - t_{f}) \{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial h (t)}\}}_{(h^{0}; φ^{0})} v^{(1)} (t) d t

(41)

This, the quantity

{\{\partial h_{n}^{d} [h (t); φ] / \partial φ\}}_{(h^{0}; φ^{0})}

in Eq. (40) denotes the partial G-derivatives of the response

h_{n}^{d} [h (t); φ]

with respect to the decoder weights

φ ≜ {[φ_{1}, ..., φ_{T D}]}^{†}

, evaluated at the nominal values

(h^{0}; φ^{0})

. The quantity

{\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r}

is called the “direct-effect term” because it arises directly from parameter variations

δ φ

and can be computed directly using the nominal values

(h^{0}; φ^{0})

. The quantity

{\{δ r_{n} (h^{0}; φ^{0}; δ h; δ φ)\}}_{i n d}

is called the “indirect-effect term” because it arises indirectly, through the variations

v^{(1)} (t)

in the hidden state functions

h (t)

. The indirect-effect term can be quantified only after having determined the variations

v^{(1)} (t)

, which are caused by the variations

δ x

δ w

and

δ θ

The first-order relationships between the variations

v^{(1)} (t)

δ x

δ w

and

δ θ

are obtained by computing the first-order G-variation of Eqs. (1) and (2), which are obtained, by definition, as follows:

{\{\frac{d}{d ε} [\frac{d}{d t} (h^{0} + ε v^{(1)})]\}}_{ε = 0} = {\{\frac{d}{d ε} f [h^{0} + ε v^{(1)}; θ^{0} + ε δ θ; t]\}}_{ε = 0}

(42)

{\{\frac{d}{d ε} [h^{0} (t_{0}) + ε v^{(1)} (t_{0})]\}}_{ε = 0} = {\{\frac{d}{d ε} [h_{e} (x^{0} + ε δ x, w^{0} + ε δ w)]\}}_{ε = 0}

(43)

Carrying out the operations indicated in Eqs. (42) and (43) yields the following system of equations:

\frac{d v^{(1)} (t)}{d t} - {\{\frac{\partial f (h; θ)}{\partial h}\}}_{(h^{0}, θ^{0})} v^{(1)} (t) = {\{\frac{\partial f (h; θ)}{\partial θ}\}}_{(h^{0}, θ^{0})} δ θ

(44)

v^{(1)} (t_{0}) = {\{\frac{\partial h_{e} (x, w)}{\partial x}\}}_{(x^{0}, w^{0})} δ x + {\{\frac{\partial h_{e} (x, w)}{\partial w}\}}_{(x^{0}, w^{0})} δ w

(45)

where:

\frac{\partial f (h; θ)}{\partial h} ≜ {(\begin{matrix} \partial f_{1} / \partial h_{1} & \cdot & \partial f_{1} / \partial h_{T H} \\ \cdot & \cdot & \cdot \\ \partial f_{T H} / \partial h_{1} & \cdot & \partial f_{T H} / \partial h_{T H} \end{matrix})}_{T H \times T H}

(46)

\frac{\partial f (h; θ)}{\partial θ} ≜ {(\begin{matrix} \partial f_{1} / \partial θ_{1} & \cdot & \partial f_{1} / \partial θ_{T W} \\ \cdot & \cdot & \cdot \\ \partial f_{T H} / \partial θ_{1} & \cdot & \partial f_{T H} / \partial θ_{T W} \end{matrix})}_{T H \times T W}

(47)

\frac{\partial h_{e} (x, w)}{\partial x} ≜ {(\begin{matrix} \partial h_{1}^{e} / \partial x_{1} & \cdot & \partial h_{1}^{e} / \partial x_{T I} \\ \cdot & \cdot & \cdot \\ \partial h_{T H}^{e} / \partial x_{1} & \cdot & \partial h_{T H}^{e} / \partial x_{T I} \end{matrix})}_{T H \times T I}

(48)

\frac{\partial h_{e} (x, w)}{\partial w} ≜ {(\begin{matrix} \partial h_{1}^{e} / \partial w_{1} & \cdot & \partial h_{1}^{e} / \partial w_{T E W} \\ \cdot & \cdot & \cdot \\ \partial h_{T H}^{e} / \partial w_{1} & \cdot & \partial h_{T H}^{e} / \partial w_{T E W} \end{matrix})}_{T H \times T E W}

(49)

The system comprising Eqs. (44) and (45) is called the “1^st-Level Variational Sensitivity System” (1^st-LVSS), and its solution,

v^{(1)} (t)

, is called the “1^st-level variational sensitivity function.” Note that the 1^st-LVSS would need to be solved anew for each component of the variations

δ x

δ w

and

δ θ

, which would be prohibitively expensive computationally.

The need for solving the 1^st-LVSS can be avoided if the indirect-effect term defined in Eq. (41) could be expressed in terms of a “right-hand side” that does not involve the function

v^{(1)} (t)

. This goal can be achieved by expressing the right-side of Eq. (41) in terms of the solutions of the “1^st-Level Adjoint Sensitivity System (1^st-LASS),” the construction of which requires the introduction of adjoint operators. Adjoint operators can be defined in Banach spaces but are most useful in Hilbert spaces. For the NODE considered in this work, the appropriate Hilbert space is defined on the domain

Ω_{t} ≜ [t_{0}, t_{f}]

and will be denoted as

H_{1} (Ω_{t})

, so that

v^{(1)} (t) \in H_{1} (Ω_{t})

. In

H_{1} (Ω_{t})

, the inner product of two vectors in

u^{(a)} (t) \in H_{1} (Ω_{t})

and

u^{(b)} (t) \in H_{1} (Ω_{t})

will be denoted as

{〈u^{(a)}, u^{(b)}〉}_{1}

, and is defined as follows:

{〈u^{(a)}, u^{(b)}〉}_{1} ≜ {\{\int_{t_{0}}^{t_{f}} u^{(a)} (t) \cdot u^{(b)} (t) d t\}}_{(x^{0}; θ^{0}; w^{0}; φ^{0})}

(50)

where the “dot” indicates the “scalar product of two vectors” defined as follows:

u^{(a)} (t) \cdot u^{(b)} (t) ≜ {[u^{(a)} (t)]}^{†} u^{(b)} (t) ≜ \sum_{i = 1}^{T H} u_{i}^{(a)} (t) u_{i}^{(b)} (t) = {[u^{(b)} (t)]}^{†} u^{(a)} (t)

The next step is to form the inner product of Eq. (44) with a vector

a^{(1)} (t) ≜ [a_{1}^{(1)} (t), \dots, a_{T D}^{(1)} (t)] \in H_{1} (Ω_{t})

, where the superscript “(1)” indicates “1^st-Level”, to obtain the following relationship:

{〈a^{(1)} (x), \frac{d v^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})} v^{(1)} (t)〉}_{1} = {〈a^{(1)} (x), {[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ〉}_{1} .

(51)

Using the definition of the adjoint operator in

H_{1} (Ω_{t})

, the left-side of Eq. (51) is transformed as follows, after integrating by parts over the independent variable

t

\begin{array}{l} \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot \frac{d v^{(1)} (t)}{d t} d t - \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})} v^{(1)} (t) d t = a^{(1)} (t_{f}) \cdot v^{(1)} (t_{f}) \\ - a^{(1)} (t_{0}) \cdot v^{(1)} (t_{0}) + \int_{t_{0}}^{t_{f}} v^{(1)} (t) \cdot \{- \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t)\} d t . \end{array}

(52)

The last term on the right-side of Eq. (52) is now required to represent the “indirect-effect” term defined in Eq. (41), which is achieved by requiring that the 1^st-level adjoint function

a^{(1)} (t)

satisfy the following relation:

- \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t) = {\{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial h (t)}\}}_{(h^{0}; φ^{0})} δ (t - t_{f})

(53)

The definition of the 1^st-level adjoint sensitivity function

a^{(1)} (t)

is now completed by requiring it to satisfy (adjoint) “boundary conditions at the final time”

t = t_{f}

so as to eliminate the term containing the unknown values

v^{(1)} (t_{f})

in Eq. (52). This aim is achieved by requiring that

a^{(1)} (t_{f}) = 0 .

(54)

The system of equations comprising Eqs. (53) and (54) constitute the “1^st-Level Adjoint Sensitivity System (1^st-LASS)” for the 1^st-level adjoint function

a^{(1)} (t)

. Evidently, the 1^st-LASS is independent of parameter variations and needs to be solved just once to obtain the 1^st-level adjoint function

a^{(1)} (t)

. Notably, the 1^st-LASS has the same form as the “adjoint equations” used for training the NODE, cf. Eqs. (6) and (7), but with the “response”

\partial h_{n}^{d} [h (t); φ] / \partial h (t) δ (t - t_{f})

being the “source” for the 1^st-LASS, whereas the “source” in the “training” of the NODE was the “loss functional”

L [h (t); θ; t] / \partial h δ (t - t_{f})

. Evidently, the 1^st-level adjoint sensitivity function

a^{(1)} (t)

is the counterpart of the “adjoint function”

a (t)

in the “training” of the NODE.

Using the results represented by Eqs. (53), (54), (51), and (41) in Eq. (52) yields the following alternative expression for the “indirect-effect” term, which does not involve the 1^st-level variational sensitivity function

v^{(1)} (t)

but involves the 1^st-level adjoint function

a^{(1)} (t)

{\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} = {〈a^{(1)} (x), {[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ〉}_{1} + a^{(1)} (t_{0}) \cdot v^{(1)} (t_{0})

(55)

Using in Eq. (55) the expression provided for

v^{(1)} (t_{0})

in Eq. (45) yields the following expression for the “indirect-effect” term:

\begin{array}{l} {\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} = {〈a^{(1)} (x), {[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ〉}_{1} \\ + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial x}\}}_{(x^{0}, w^{0})} δ x + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial w}\}}_{(x^{0}, w^{0})} δ w \end{array}

(56)

Replacing the expression obtained in Eq. (55) for the “indirect-effect term” together with the expression of the direct-effect term provided by Eq. (40) into Eq. (39) yields the following expression for the first-order G-variation

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

of the response

r_{n} (h; φ)

\begin{array}{l} δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ) = {\{\frac{\partial h_{n}^{d} [h (t_{f}); φ]}{\partial φ}\}}_{(h^{0}; φ^{0})} δ φ + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial w}\}}_{(x^{0}, w^{0})} δ w \\ + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial x}\}}_{(x^{0}, w^{0})} δ x + \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot \{{[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ\} d t; n = 1, ..., T R . \end{array}

(57)

As indicated by the right-side of Eq. (57), the (partial) sensitivities of the response

r_{n} (h; φ)

are provided by the following expressions, all of which are to be evaluated at the nominal values of all functions and parameters/weights:

\frac{\partial r_{n}}{\partial φ_{i}} = \frac{\partial h_{n}^{d} [h (t_{f}); φ]}{\partial φ_{i}}; i = 1, ..., T D; ​ ​ ​ ​ n = 1, ..., T R;

(58)

\frac{\partial r_{n}}{\partial w_{i}} = a^{(1)} (t_{0}) \cdot \frac{\partial h_{e} (x, w)}{\partial w_{i}}; i = 1, ..., T E W; n = 1, ..., T R;

(59)

\frac{\partial r_{n}}{\partial x_{i}} = a^{(1)} (t_{0}) \cdot \frac{\partial h_{e} (x, w)}{\partial x_{i}}; i = 1, ..., T I; n = 1, ..., T R;

(60)

\frac{\partial r_{n}}{\partial θ_{i}} = \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot \frac{\partial f (h; θ)}{\partial θ_{i}} d t; i = 1, ..., T W; n = 1, ..., T R .

(61)

5. Illustrative Application of the 1^st-CASAM-NODE Methodology to Compute First-Order Sensitivities of Nordheim-Fuchs Model Responses with respect to the Underlying Parameters

The application of the 1^st-CASAM-NODE methodology to compute the first-order sensitivities of the responses

r_{1} (h)

r_{2} (h)

r_{3} (h)

and

r_{4} (h)

with respect to the Nordheim-Fuchs model’s parameters and initial conditions will be presented below in Subsections 5.1 through 5.4, respectively. Using the “energy-released” response

r_{2} (h) = E (t_{f})

as a paradigm, Subsection 5.5 will illustrate an alternative path for computing first-order sensitivities by applying the “First-Order Feature Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (1^st-FASAM-N)” [24], which is the most efficient procedure for computing sensitivities, but which may require the construction of a dedicated neural net for this purpose.

5.1. First-Order Sensitivities of the Flux Response $r_{1} (h) = φ (t_{f})$

The first-order sensitivity of the response

r_{1} (h) = φ (t_{f})

is provided by the first-order G-differential of the expression in Eq. (31), which is, by definition, obtained as follows:

δ r_{1} (h; δ h) = {\{\frac{d}{d ε} [\int_{0}^{t_{f}} [φ^{0} (t) + ε δ φ (t)] δ (t - t_{f}) d t]\}}_{ε = 0} = \int_{0}^{t_{f}} δ φ (t) δ (t - t_{f}) d t

(62)

The variation

δ φ (t)

is the solution of the “First-Level Variational Sensitivity System (1^st-LVSS) which is obtained by G-differentiating Eqs. (13)‒(15), which yields the following expressions:

{\{\frac{d}{d ε} [\frac{d}{d t} (φ^{0} + ε δ φ)]\}}_{ε = 0} = - {\{\frac{d}{d ε} [\frac{α_{T}^{0} + ε δ α_{T}}{(l_{p}^{0} + ε δ l_{p}) (c_{p}^{0} + ε δ c_{p})} (E^{0} + ε δ E) (φ^{0} + ε δ φ)]\}}_{ε = 0},

(63)

{\{\frac{d}{d ε} [\frac{d}{d t} (E^{0} + ε δ E)]\}}_{ε = 0} = {\{\frac{d}{d ε} [(γ^{0} + ε δ γ) (σ_{f}^{0} + ε δ σ_{f}) (N_{f}^{0} + ε δ N_{f}) (φ^{0} + ε δ φ)]\}}_{ε = 0},

(64)

{\{\frac{d}{d ε} [\frac{d}{d t} (T^{0} + ε δ T)]\}}_{ε = 0} = {\{\frac{d}{d ε} [\frac{(γ^{0} + ε δ γ) (σ_{f}^{0} + ε δ σ_{f}) (N_{f}^{0} + ε δ N_{f})}{(c_{p}^{0} + ε δ c_{p})} (φ^{0} + ε δ φ)]\}}_{ε = 0},

(65)

{\{\frac{d}{d ε} {[φ^{0} (t) + ε δ φ (t)]}_{t = 0}\}}_{ε = 0} = {\{\frac{d}{d ε} (φ_{0}^{0} + ε δ φ_{0})\}}_{ε = 0}

(66)

{\{\frac{d}{d ε} {[E^{0} (t) + ε δ E (t)]}_{t = 0}\}}_{ε = 0} = 0

(67)

{\{\frac{d}{d ε} {[T^{0} (t) + ε δ T (t)]}_{t = 0}\}}_{ε = 0} = {\{\frac{d}{d ε} (T_{0}^{0} + ε δ T_{0})\}}_{ε = 0}

(68)

Performing the operations involving the scalar

ε

in Eqs. (63)‒(68) yields the following expression for the 1^st-LVSS:

\begin{array}{l} \frac{d}{d t} δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} E^{0} (t) δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} φ^{0} (t) δ E (t) \\ = [- \frac{δ α_{T}}{l_{p}^{0} c_{p}^{0}} + \frac{α_{T}^{0}}{{(l_{p}^{0})}^{2} c_{p}^{0}} δ l_{p} + \frac{α_{T}^{0}}{l_{p}^{0} {(c_{p}^{0})}^{2}} δ c_{p}] E^{0} (t) φ^{0} (t), \end{array}

(69)

\frac{d}{d t} δ E (t) - γ^{0} σ_{f}^{0} N_{f}^{0} δ φ (t) = [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f}] φ^{0} (t),

(70)

\begin{array}{l} \frac{d}{d t} δ T (t) - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ φ (t) \\ = [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f} - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ c_{p}] \frac{φ^{0} (t)}{c_{p}^{0}}, \end{array}

(71)

{[δ φ (t)]}_{t = 0} = δ φ_{0}

(72)

{[δ E (t)]}_{t = 0} = 0

(73)

{[δ T (t)]}_{t = 0} = δ T_{0}

(74)

The 1^st-LVSS comprising Eqs. (69)‒(74) represents the specific form taken on by the general NODE-representation of the 1^st-LVSS provided by Eqs. (44) and (45) for the Nordheim-Fuchs model. Comparing Eqs. (69)‒(74) to Eqs. (44) and (45) indicates the following correspondences:

\frac{\partial f (h; θ)}{\partial h} ≜ (\begin{matrix} - \frac{α_{T}^{0} E^{0} (t)}{l_{p}^{0} c_{p}^{0}} & - \frac{α_{T}^{0} φ^{0} (t)}{l_{p}^{0} c_{p}^{0}} & 0 \\ γ^{0} σ_{f}^{0} N_{f}^{0} & 0 & 0 \\ \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} & 0 & 0 \end{matrix}); v^{(1)} (t) ≜ (\begin{matrix} δ φ (t) \\ δ E (t) \\ δ T (t) \end{matrix});

(75)

\frac{\partial f (h; θ)}{\partial θ} ≜ (\begin{matrix} \frac{\partial f_{1}}{\partial θ_{1}} & \frac{\partial f_{1}}{\partial θ_{2}} & \frac{\partial f_{1}}{\partial θ_{3}} & 0 & 0 & 0 \\ 0 & 0 & 0 & \frac{\partial f_{2}}{\partial θ_{4}} & \frac{\partial f_{2}}{\partial θ_{5}} & \frac{\partial f_{2}}{\partial θ_{6}} \\ 0 & 0 & \frac{\partial f_{3}}{\partial θ_{3}} & \frac{\partial f_{3}}{\partial θ_{4}} & \frac{\partial f_{3}}{\partial θ_{5}} & \frac{\partial f_{3}}{\partial θ_{6}} \end{matrix}); δ θ ≜ (\begin{matrix} δ α \\ δ l_{p} \\ δ c_{p} \\ δ γ \\ δ σ_{f} \\ δ N_{f} \end{matrix});

(76)

\begin{array}{l} \frac{\partial f_{1}}{\partial θ_{1}} ≜ - \frac{E^{0} (t) φ^{0} (t)}{l_{p}^{0} c_{p}^{0}}; \frac{\partial f_{1}}{\partial θ_{2}} ≜ \frac{α_{T}^{0} E^{0} (t) φ^{0} (t)}{{(l_{p}^{0})}^{2} c_{p}^{0}}; \frac{\partial f_{1}}{\partial θ_{3}} ≜ \frac{α_{T}^{0} E^{0} (t) φ^{0} (t)}{l_{p}^{0} {(c_{p}^{0})}^{2}}; \\ \frac{\partial f_{2}}{\partial θ_{4}} ≜ σ_{f}^{0} N_{f}^{0} φ^{0} (t); \frac{\partial f_{2}}{\partial θ_{5}} ≜ γ^{0} N_{f}^{0} φ^{0} (t); \frac{\partial f_{2}}{\partial θ_{6}} ≜ γ^{0} σ_{f}^{0} φ^{0} (t); \\ \frac{\partial f_{3}}{\partial θ_{3}} ≜ - \frac{γ^{0} σ_{f}^{0} N_{f}^{0} φ^{0} (t)}{{(c_{p}^{0})}^{2}}; \frac{\partial f_{3}}{\partial θ_{4}} ≜ \frac{σ_{f}^{0} N_{f}^{0} φ^{0} (t)}{c_{p}^{0}}; \frac{\partial f_{3}}{\partial θ_{5}} ≜ \frac{γ^{0} N_{f}^{0} φ^{0} (t)}{c_{p}^{0}}; \frac{\partial f_{3}}{\partial θ_{6}} ≜ \frac{γ^{0} σ_{f}^{0} φ^{0} (t)}{c_{p}^{0}} . \end{array}

(77)

\frac{\partial h_{e} (x, w)}{\partial x} ≜ (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}); δ x ≜ (\begin{matrix} δ φ_{0} \\ 0 \\ δ T_{0} \end{matrix}); \frac{\partial h_{e} (x, w)}{\partial w} ≜ (\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) .

(78)

It is evident that the 1^st-LVSS would need to be solved repeatedly in order to compute the 1^st-level variational function

v^{(1)} (t) ≜ {[δ φ (t), δ E (t), δ T (t)]}^{†}

for every possible variations

δ θ

in the model parameters and variations

δ x

in the initial conditions (‘encoder”). This computationally expensive path can be avoided by applying the concepts of the 1^st- CASAM-NODE previously outlined in Subsection 4.1, as follows:

Consider that the 1^st-level variational function $v^{(1)} ≜ {[δ φ (t), δ E (t), δ T (t)]}^{†} \in H_{1} (Ω_{t})$ , is an element in a Hilbert space denoted as $H_{1} (Ω_{t})$ , $Ω_{t} ≜ (0, t_{f})$ , comprising elements of the form $u^{(a)} (t) ≜ {[u_{1}^{(a)} (t), u_{2}^{(a)} (t), u_{3}^{(a)} (t)]}^{†}$ , $u^{(b)} (t) ≜ {[u_{1}^{(b)} (t), u_{2}^{(b)} (t), u_{3}^{(b)} (t)]}^{†}$ , and being endowed with the inner product ${〈u^{(a)}, u^{(b)}〉}_{1}$ introduced in Eq. (50), which takes on the following particular form for the Nordheim-Fuchs model:

{〈u^{(a)}, u^{(b)}〉}_{1} ≜ \int_{0}^{t_{f}} u^{(a)} (t) \cdot u^{(b)} (t) d t = \sum_{i = 1}^{3} \int_{0}^{t_{f}} u_{i}^{(a)} (t) u_{i}^{(b)} (t) d t

(79)

2.: Use Eq. (79) to form the inner product of Eqs. (69)‒(71) with a yet undefined function $a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})$ , to obtain the following relation, which is the particular form taken on by Eq. (51) for the Nordheim-Fuchs model:

\begin{array}{l} \int_{0}^{t_{f}} a_{1}^{(1)} (t) [\frac{d}{d t} δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} E^{0} (t) δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} φ^{0} (t) δ E (t)] d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [\frac{d}{d t} δ E (t) - γ^{0} σ_{f}^{0} N_{f}^{0} δ φ (t)] d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [\frac{d}{d t} δ T (t) - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ φ (t)] d t \\ = \int_{0}^{t_{f}} a_{1}^{(1)} (t) [- \frac{δ α_{T}}{l_{p}^{0} c_{p}^{0}} + \frac{α_{T}^{0}}{{(l_{p}^{0})}^{2} c_{p}^{0}} δ l_{p} + \frac{α_{T}^{0}}{l_{p}^{0} {(c_{p}^{0})}^{2}} δ c_{p}] E^{0} (t) φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f}] φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f} - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ c_{p}] \frac{φ^{0} (t)}{c_{p}^{0}} d t . \end{array}

(80)

3.: Integrating by parts the terms on the left-side of Eq. (80) yields the following relation

\begin{array}{l} \int_{0}^{t_{f}} a_{1}^{(1)} (t) [\frac{d}{d t} δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} E^{0} (t) δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} φ^{0} (t) δ E (t)] d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [\frac{d}{d t} δ E (t) - γ^{0} σ_{f}^{0} N_{f}^{0} δ φ (t)] d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [\frac{d}{d t} δ T (t) - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ φ (t)] d t = a_{1}^{(1)} (t_{f}) δ φ (t_{f}) - a_{1}^{(1)} (0) δ φ (0) \\ + a_{2}^{(1)} (t_{f}) δ E (t_{f}) - a_{2}^{(1)} (0) δ E (0) + a_{3}^{(1)} (t_{f}) δ T (t_{f}) - a_{3}^{(1)} (0) δ T (0) \\ + \int_{0}^{t_{f}} v^{(1)} (t) \cdot {\{A^{(1)} (h; θ) a^{(1)} (t)\}}_{(h^{0}; θ^{0})} d t, \end{array}

(81)

where:

A^{(1)} (h; θ) a^{(1)} (t) ≜ - \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t)

(82)

With

{[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} ≜ (\begin{matrix} - α_{T}^{0} E^{0} (t) / (l_{p}^{0} c_{p}^{0}) & γ^{0} σ_{f}^{0} N_{f}^{0} & γ^{0} σ_{f}^{0} N_{f}^{0} / c_{p}^{0} \\ - α_{T}^{0} φ^{0} (t) / (l_{p}^{0} c_{p}^{0}) & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) .

(83)

The relation obtained in Eq. (81) is the particular form taken on by Eq. (52) for the Nordheim-Fuchs model.

4.: The definition of the function $a^{(1)} (t)$ is now completed by requiring that: (i) the integral term on the right-side of Eq.(81) represent the G-differential $δ r_{1} (h; δ h)$ defined in Eq. (62), and (ii) the appearance of the unknown values of the components of $v^{(1)} (t_{f})$ be eliminated from appearing in Eq. (81). These requirements will be satisfied if the function $a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})$ is the solution of the following “1^st-Level Adjoint Sensitivity System (1^st-LASS)”:

A^{(1)} (h; θ) a^{(1)} (t) ≜ - \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t) = {[δ (t - t_{f}), 0, 0]}^{†}

(84)

a^{(1)} (t_{f}) ≜ {[a_{1}^{(1)} (t_{f}), a_{2}^{(1)} (t_{f}), a_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†}

(85)

It is important to note that if the vector-valued function

f (h; θ)

is linear in

h (t)

(in which case the NODE would be linear), then the 1^st-level adjoint sensitivity function

a^{(1)} (t)

would not depend on

h (t)

, so the “forward solution path” would not need to be stored in order to compute

a^{(1)} (t)

. Otherwise, however, the “forward solution path”

h (t)

would need to be stored in order to compute

a^{(1)} (t)

5.: Using Eqs. (84), (85), (80), (62), (72), (73) and (74) in Eq. (81) yields the following expression for the first G-differential $δ r_{1} (h; δ h)$ of the response under consideration:

\begin{array}{l} δ r_{1} (h; δ h) = δ φ (t_{f}) = \int_{0}^{t_{f}} a_{1}^{(1)} (t) [- \frac{δ α_{T}}{l_{p}^{0} c_{p}^{0}} + \frac{α_{T}^{0}}{{(l_{p}^{0})}^{2} c_{p}^{0}} δ l_{p} + \frac{α_{T}^{0}}{l_{p}^{0} {(c_{p}^{0})}^{2}} δ c_{p}] E^{0} (t) φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f}] φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f} - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ c_{p}] \frac{φ^{0} (t)}{c_{p}^{0}} d t \\ + a_{1}^{(1)} (0) δ φ_{0} + a_{3}^{(1)} (0) δ T_{0} . \end{array}

(86)

It follows from Eq. (86) that the first-order sensitivities of the response

φ (t_{f})

with respect to the parameters and initial conditions underlying the Nordheim-Fuchs model have the following expressions, all of which are to be evaluated at the nominal values of the respective parameters and functions (but the superscript “zero” is omitted to simplify the notation):

\frac{\partial φ (t_{f})}{\partial α_{T}} = - \frac{1}{l_{p} c_{p}} \int_{0}^{t_{f}} a_{1}^{(1)} (t) E (t) φ (t) d t

(87)

\frac{\partial φ (t_{f})}{\partial l_{p}} = \frac{α_{T}}{{(l_{p})}^{2} c_{p}} \int_{0}^{t_{f}} a_{1}^{(1)} (t) E (t) φ (t) d t

(88)

\frac{\partial φ (t_{f})}{\partial c_{p}} = \frac{α_{T}}{l_{p} {(c_{p})}^{2}} \int_{0}^{t_{f}} a_{1}^{(1)} (t) E (t) φ (t) d t - \frac{γ σ_{f} N_{f}}{{(c_{p})}^{2}} \int_{0}^{t_{f}} a_{3}^{(1)} (t) φ (t) d t

(89)

\frac{\partial φ (t_{f})}{\partial γ} = σ_{f} N_{f} \int_{0}^{t_{f}} [a_{2}^{(1)} (t) + \frac{1}{c_{p}} a_{3}^{(1)} (t)] φ (t) d t

(90)

\frac{\partial φ (t_{f})}{\partial σ_{f}} = γ N_{f} \int_{0}^{t_{f}} [a_{2}^{(1)} (t) + \frac{1}{c_{p}} a_{3}^{(1)} (t)] φ (t) d t

(91)

\frac{\partial φ (t_{f})}{\partial N_{f}} = γ σ_{f} \int_{0}^{t_{f}} [a_{2}^{(1)} (t) + \frac{1}{c_{p}} a_{3}^{(1)} (t)] φ (t) d t

(92)

\frac{\partial φ (t_{f})}{\partial φ_{0}} = a_{1}^{(1)} (0); \frac{\partial φ (t_{f})}{\partial E (0)} = 0; ​ ​ ​ \frac{\partial φ (t_{f})}{\partial T_{0}} = a_{3}^{(1)} (0)

(93)

5.2. First-Order Sensitivities of the Energy Released Response $r_{2} (h) = E (t_{f})$

The first-order G-differential of the response

r_{2} (h) = E (t_{f})

defined in Eq. (32) is obtained as follows:

δ r_{2} (h; δ h) = {\{\frac{d}{d ε} [\int_{0}^{t_{f}} [E^{0} (t) + ε δ E (t)] δ (t - t_{f}) d t]\}}_{ε = 0} = \int_{0}^{t_{f}} δ E (t) δ (t - t_{f}) d t

(94)

where the variation

δ E (t)

is the solution of the “First-Level Variational Sensitivity System (1^st-LVSS) defined by Eqs. (69)‒(74).

The sensitivities of the response

r_{2} (h) = E (t_{f})

are determined by following the same procedure as has been outlined in Section 5.2, using an adjoint function denoted as

χ^{(1)} (t) ≜ {[χ_{1}^{(1)} (t), χ_{2}^{(1)} (t), χ_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

. Following the same steps as in Section 5.2 (which are omitted here to avoid undue repetition) leads to the following 1^st-LASS for the 1^st-level adjoint sensitivity function

χ^{(1)} (t)

A^{(1)} (h; θ) χ^{(1)} (t) ≜ - \frac{d χ^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} χ^{(1)} (t) = {[0, δ (t - t_{f}), 0]}^{†}

(95)

χ^{(1)} (t_{f}) ≜ {[χ_{1}^{(1)} (t_{f}), χ_{2}^{(1)} (t_{f}), χ_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†}

(96)

The sensitivities of

E (t_{f})

with respect to the model parameters and initial conditions have the same formal expressions as shown in Eqs. (87)‒(93), but with the components of the 1^st-level adjoint sensitivity function

χ^{(1)} (t)

replacing the components of

a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} .

5.3. First-Order Sensitivities of the Temperature Response $r_{3} (h) = T (t_{f})$

The first-order G-differential of the response

r_{3} (h) = T (t_{f})

defined in Eq. (33) is obtained as follows:

δ r_{3} (h; δ h) = {\{\frac{d}{d ε} [\int_{0}^{t_{f}} [T^{0} (t) + ε δ T (t)] δ (t - t_{f}) d t]\}}_{ε = 0} = \int_{0}^{t_{f}} δ T (t) δ (t - t_{f}) d t

(97)

where the variation

δ T (t)

is the solution of the “First-Level Variational Sensitivity System (1^st-LVSS) defined by Eqs. (69)‒(74).

The sensitivities of the response

r_{3} (h) = T (t_{f})

are determined by following the same procedure as has been outlined in Section 5.1, using an adjoint function denoted as

ξ^{(1)} (t) ≜ {[ξ_{1}^{(1)} (t), ξ_{2}^{(1)} (t), ξ_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

. Following the same steps as in Section 5.1 (which are omitted here to avoid undue repetition) leads to the following 1^st-LASS for the 1^st-level adjoint sensitivity function

ξ^{(1)} (t)

A^{(1)} (h; θ) ξ^{(1)} (t) ≜ - \frac{d ξ^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} ξ^{(1)} (t) = {[0, 0, δ (t - t_{f})]}^{†}

(98)

ξ^{(1)} (t_{f}) ≜ {[ξ_{1}^{(1)} (t_{f}), ξ_{2}^{(1)} (t_{f}), ξ_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†}

(99)

The sensitivities of

T (t_{f})

with respect to the model parameters and initial conditions have the same formal expressions as shown in Eqs. (87)‒(93), but with the components of the 1^st-level adjoint sensitivity function

ξ^{(1)} (t) ≜ {[ξ_{1}^{(1)} (t), ξ_{2}^{(1)} (t), ξ_{3}^{(1)} (t)]}^{†}

replacing the components of

a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} .

5.4. First-Order Sensitivities of the Thermal Conductivity Response $r_{4} (h; φ) = k (T_{f}; φ)$

The first-order G-differential of the response

r_{4} (h; φ) = k (T_{f}; φ)

defined in Eq. (34) is obtained as follows:

\begin{array}{l} δ r_{4} (h; φ; δ h; δ φ) = δ k (T; φ; δ T; δ φ) = {\{\frac{d}{d ε} \int_{0}^{t_{f}} [φ_{1}^{0} + ε δ φ_{1}] δ (t - t_{f}) d t\}}_{ε = 0} \\ + {\{\frac{d}{d ε} \int_{0}^{t_{f}} [(φ_{2}^{0} + ε δ φ_{2}) (T^{0} + ε δ T) + (φ_{3}^{0} + ε δ φ_{3}) {(T^{0} + ε δ T)}^{2}] δ (t - t_{f}) d t\}}_{ε = 0} \\ = {\{δ k (T; φ; δ φ)\}}_{d i r} + {\{δ k (T; φ; δ T)\}}_{i n d}, \end{array}

(100)

where the direct-effect and the indirect-effect terms, respectively, are defined as follows:

{\{δ k (T; φ; δ φ)\}}_{d i r} ≜ δ φ_{1} + δ φ_{2} \int_{0}^{t_{f}} T^{0} (t) δ (t - t_{f}) + δ φ_{3} \int_{0}^{t_{f}} [T^{0} (t)]^{2} δ (t - t_{f})

(101)

{\{δ k (T; φ; δ T)\}}_{i n d} ≜ \int_{0}^{t_{f}} [φ_{2}^{0} + 2 φ_{3}^{0} T^{0} (t)] δ T (t) δ (t - t_{f}) d t

(102)

The direct-effect term yields the following sensitivities which can be evaluated immediately:

\frac{\partial k (T_{f})}{\partial φ_{1}} = 1; \frac{\partial k (T_{f})}{\partial φ_{2}} = T^{0} (t_{f}); \frac{\partial k (T_{f})}{\partial φ_{3}} = {[T^{0} (t_{f})]}^{2}

(103)

The indirect-effect term can be evaluated only after determining the variational function

δ T (t)

, which is the solution of the 1^st-LVSS defined by Eqs. (69)‒(74). The need for solving (repeatedly) the 1^st-LVSS can be circumvented by applying the principles of the 1^st-CASAM-NODE, as previously outlined. Thus, following the same procedure as detailed in Section 5.1 leads to the following 1^st-LASS for the 1^st-level adjoint sensitivity function, denoted as

ψ^{(1)} (t) ≜ {[ψ_{1}^{(1)} (t), ψ_{2}^{(1)} (t), ψ_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

, for computing the sensitivities stemming from the indirect-effect term

{\{δ k (T; φ; δ T)\}}_{i n d}

A^{(1)} (h; θ) ψ^{(1)} (t) = {[[φ_{2}^{0} + 2 φ_{3}^{0} T^{0} (t)] δ (t - t_{f}), 0, 0]}^{†}

(104)

ψ^{(1)} (t_{f}) ≜ {[ψ_{1}^{(1)} (t_{f}), ψ_{2}^{(1)} (t_{f}), ψ_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†}

(105)

It is important to note that all of the following 1^st-Level Adjoint Sensitivity Systems, enumerated in items (i) through (iv), below:

the 1^st-LASS defined by Eqs. (84) and (85), which are solved for obtaining the corresponding 1^st-level adjoint sensitivity function needed for computing the sensitivities of the component

h_{1} (t) ≜ φ (t)

of the state function

h (t)

;

the 1^st-LASS defined by Eqs. (95) and (96), which are solved for obtaining the corresponding 1^st-level adjoint sensitivity function needed for computing the sensitivities of the component

h_{2} (t) ≜ E (t)

of the state function

h (t)

];

the 1^st-LASS defined by Eqs. (98) and (99), which are solved for obtaining the corresponding 1^st-level adjoint sensitivity function needed for computing the sensitivities of the component

h_{3} (t) ≜ T (t)

of the state function

h (t)

, and

the 1^st-LASS defined by Eqs. (104) and (105), which are solved for obtaining the corresponding 1^st-level adjoint sensitivity function needed for computing the sensitivities stemming from the indirect-effect term

{\{δ k (T; φ; δ T)\}}_{i n d}

;

…have the same structures/operators on their left sides, and the respective adjoint sensitivity function all satisfy the same final-time conditions; only the source terms on the right-sides of the respective 1^st-LASS differ from each other. Consequently, the same numerical procedures and/or neural nets can be used for computing the respective 1^st-level adjoint sensitivity functions.

Since the NODE is a first-order ODE, the corresponding 1^st-LASS is solved “backwards” in time, starting at the final time-step

t = t_{f}

, as indicated by the general 1^st-CASAM-NODE methodology presented in Section 4. If the NODE is linear in the state function (dependent variable)

h (t)

, then the 1^st-LASS will be independent of

h (t)

, so the “forward solution path” would not need to be stored in order to compute the 1^st-level adjoint sensitivity functions. In contradistinction, if the NODE is nonlinear in the state function (dependent variable)

h (t)

, then the 1^st-LASS will depend on

h (t)

, so the “forward solution path” would need to be stored in order to compute the respective 1^st-level adjoint sensitivity functions.

Furthermore, the same formal expressions are obtained for the sensitivities of the responses considered. Thus, the respective 1^st-level adjoint sensitivity functions differ from each other according to the response considered, but the quadrature-schemes needed to evaluate the integrals defining the respective sensitivities are the same. Therefore, the same numerical procedures and/or neural nets can be used for computing the respective integrals that define the 1^st-order sensitivities, while using the appropriate/corresponding 1^st-level adjoint sensitivity functions. If the decoder-response depends on parameters/weights, additional sensitivities arise from the respective non-vanishing “direct-effect term.”

If simple relations can be obtained among the responses of interest, such as Eqs. (11) and (16) for the illustrative paradigm example, then the sensitivities of the various responses can be obtained by using these relationships, but this is seldom the case in practice.

5.5. Most Efficient Computation of First-Order Sensitivities: Application of the 1^st-FASAM-N

In most, if not all, practical situations, the equations modeling the physical system under consideration can be recast to suit the computation of the response under consideration and, consequently, the computation of the response sensitivities with respect to the underlying model parameters. For example, the response

r_{2} (h) = E (t_{f})

involves just the function

E (t)

; hence, this response would be ideally computed, together with its sensitivities to parameters, by using an equation containing as few as possible dependent variables other than the ones [e.g.,

E (t)

] needed for computing the response. Such an equation was obtained in Eq. (17), which contains just the dependent variable

E (t)

, so it would be more advantageous to us it for the sensitivity analysis of

r_{2} (h) = E (t_{f})

rather than use the entire system of equations underlying the Nordheim-Fuchs model, as was done, for illustrative purposes, in Section 5.1. Furthermore, the form of Eq. (17) indicates that the “features” (i.e., functions) of model parameters characterizing this balance equation can be chosen as follows:

F_{1} (p) ≜ \frac{α_{T}}{2 l_{p} c_{p}}; F_{2} (p) ≜ φ_{0} γ σ_{f} N_{f}; F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†}

(106)

where the vector of primary model parameters is defined as follows:

p ≜ {[p_{1}, ..., p_{7}]}^{†} ≜ {[α_{T}, l_{p}, c_{p}, γ, σ_{f}, N_{f}, φ_{0}]}^{†}

(107)

Note that the vector

p

includes the initial condition

φ_{0}

In terms of the “feature function”

F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†}

, Eq. (17) can alternatively be written as follows:

\frac{d E (t)}{d t} = - F_{1} (p) E^{2} (t) + F_{2} (p), E (0) = 0

(108)

In terms of the feature function

F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†}

, the solution of Eq. (108) has the following form:

E (t) = {[\frac{F_{2} (p)}{F_{1} (p)}]}^{1 / 2} \tanh [t G (p)]; G (p) ≜ \sqrt{F_{1} (p) F_{2} (p)}

(109)

Of course, a specific NODE would need to be constructed to model Eq. (108).

The form of Eq. (108) is suitable for applying the “n^th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (n^th-FASAM-N)” [24], which is the most efficient methodology for computing sensitivities, particularly for sensitivities of second- and higher-order. This methodology considers the specific “features” of model parameters, such as the function

F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†}

, to compute sensitivities with respect to model parameters more efficiently than by considering directly the respective primary parameters.

For the computation of 1^st-order sensitivities, the 1^st-FASAM-N commences by constructing the 1^st-Level Variational Sensitivity System (1^st-LVSS) for the variational function

δ E (t)

by applying the definition of the first-order G-differential to Eq. (108), which yields:

\frac{d}{d ε} \{\frac{d [E^{0} (t) + ε δ E (t)]}{d t} + [F_{1}^{0} + ε δ F_{1}] {[E^{0} + ε δ E]}^{2} - [F_{2}^{0} + ε (δ F_{2})]\} ε = 0 = 0

(110)

\frac{d}{d ε} {\{{[E^{0} (t) + ε δ E (t)]}_{t = 0}\}}_{ε = 0} = 0

(111)

Performing the operations indicated in Eqs. (110) and (111) yields the following expression for the 1^st-LVSS satisfied by the variational function

δ E (t)

[\frac{d}{d t} + 2 F_{1} E (t)] δ E (t) = - δ F_{1} E^{2} (t) + δ F_{2}, t > 0

(112)

δ E (0) = 0, t = 0.

(113)

The 1^st-LVSS represented by Eq. (112) is to be solved at the nominal values for the parameters and the state function

E (t)

but the superscript “0” (which indicates “nominal values”) has been omitted to simplify the notation.

Numerically, the 1^st-LVSS would need to be solved anew for the various variations

δ F_{1}

δ F_{2}

, in the components of the feature function

F (p)

. This need for repeatedly solving the 1^st-LVSS can be avoided by constructing the corresponding 1^st-Level Adjoint Sensitivity System (1^st-LASS). The Hilbert space appropriate for the construction of the 1^st-LASS corresponding to Eq. (112) is endowed with the following particular form of Eq. (79):

{〈u^{(a)} (t), u^{(b)} (t)〉}_{1} ≜ \int_{0}^{t_{f}} u^{(a)} (t) u^{(b)} (t) d t

(114)

Using Eq. (114) to form the inner product of Eq. (112) with a yet undefined function

ω^{(1)} (t)

yields the following relation:

\int_{0}^{t_{f}} ω^{(1)} (t) [\frac{d}{d t} + 2 F_{1} E (t)] δ E (t) d t = - (δ F_{1}) \int_{0}^{t_{f}} ω^{(1)} (t) E^{2} (t) d t + (δ F_{2}) \int_{0}^{t_{f}} ω^{(1)} (t) d t

(115)

Integrating by parts the left side of Eq. (115) yields the following relation:

\begin{array}{l} \int_{0}^{t_{f}} ω^{(1)} (t) [\frac{d}{d t} + 2 F_{1} E (t)] δ E (t) d t = ω^{(1)} (τ) δ E (τ) - ω^{(1)} (0) δ E (0) \\ + \int_{0}^{t_{f}} δ E (t) [- \frac{d ω^{(1)} (t)}{d t} + 2 F_{1} E (t) ω^{(1)} (t)] d t . \end{array}

(116)

Identifying the integral on the right-side of Eq. (116) with the G-differential

δ E (τ)

of the response

E (τ)

obtained in Eq. (32) and eliminating the unknown value

δ E (τ)

from the right-side of Eq. (116) by setting

ω^{(1)} (τ) = 0

yields the following 1^st-Level Adjoint Sensitivity System (1^st-LASS) for the 1^st-level adjoint sensitivity function

ω^{(1)} (t)

[- \frac{d}{d t} + 2 F_{1} E (t)] ω^{(1)} (t) = δ (t - t_{f}), ​ ​ ​ t > 0

(117)

ω^{(1)} (t_{f}) = 0, t = t_{f}

(118)

The 1^st-LASS represented by Eqs. (117) and (118) is independent of variations in the feature functions (and/or parameters) so it would need to be solved only once, numerically. In the present case, the 1^st-LASS can be solved analytically to obtain the following closed-form expression for the 1^st-level adjoint sensitivity function

ω^{(1)} (t)

ω^{(1)} (t) = H (t_{f} - t) {\{\frac{\cosh [t G (p)]}{\cosh [t_{f} G (p)]}\}}^{2}

(119)

where

H (t - t_{f})

denotes the Heaviside functional.

Using Eqs. (116)‒(118) in Eq. (115) yields the following expression for the first-order total G-differential

δ E (t_{f})

of the response

E (t_{f})

in terms of the 1^st-level adjoint function

ω^{(1)} (t)

δ E (t_{f}) = - (δ F_{1}) \int_{0}^{t_{f}} ω^{(1)} (t) E^{2} (t) d t + (δ F_{2}) \int_{0}^{t_{f}} ω^{(1)} (t) d t

(120)

It follows from Eqs. (120), (119) and (109) that the two sensitivities of the response

E (t_{f})

with respect to the two components of the feature function

F ≜ {(F_{1}, F_{2})}^{†}

have the following expressions:

\frac{\partial E (t_{f})}{\partial F_{1}} = - \int_{0}^{t_{f}} ω^{(1)} (t) E^{2} (t) d t = \frac{1}{2} {[\frac{F_{2} (p)}{F_{1} (p)}]}^{1 / 2} \{\frac{t_{f}}{\cosh^{2} [t_{f} G (p)]} - \frac{\tanh [t_{f} G (p)]}{G (p)}\};

(121)

\frac{\partial E (t_{f})}{\partial F_{2}} = \int_{0}^{t_{f}} ω^{(1)} (t) d t = \frac{1}{2 G (p)} \tanh [t_{f} G (p)] + \frac{t_{f}}{2 \cosh^{2} [t_{f} G (p)]}

(122)

The above expressions are to be evaluated at the nominal parameter values but the superscript “zero” has been omitted, for simplicity. The expressions obtained in Eqs. (121) and (122) can be verified by differentiating the expression provided in Eq. (109), evaluated at a user-chosen time

t = t_{f}

within the interval

0 < t_{f} < \infty

The sensitivities of the response

E (t_{f})

with respect to the model parameters and initial condition are obtained by using the following “chain-rule” relationship:

\frac{\partial E (t_{f}; F_{1}; F_{2})}{\partial p_{i}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1} (p)}{\partial p_{i}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2} (p)}{\partial p_{i}}; i = 1, ..., 7.

(123)

The explicit expressions for the specific sensitivities of the response

E (t_{f})

with respect to the parameters underlying the feature functions are obtained using Eq. (123) in conjunction with Eqs. (121) and (122) while recalling the definitions of the feature functions

F_{1} (p)

and

F_{2} (p)

defined in Eq. (106). The detailed expressions of these sensitivities are as follows:

\frac{\partial E (t_{f})}{\partial α_{T}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial α_{T}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial α_{T}} = \frac{1}{2 l_{p} c_{p}} \frac{\partial E (t_{f})}{\partial F_{1}}

(124)

\frac{\partial E (t_{f})}{\partial l_{p}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial l_{p}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial l_{p}} = - \frac{α_{T}}{2 {(l_{p})}^{2} c_{p}} \frac{\partial E (t_{f})}{\partial F_{1}}

(125)

\frac{\partial E (t_{f})}{\partial c_{p}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial c_{p}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial c_{p}} = - \frac{α_{T}}{2 {(c_{p})}^{2} l_{p}} \frac{\partial E (t_{f})}{\partial F_{1}}

(126)

\frac{\partial E (t_{f})}{\partial γ} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial γ} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial γ} = φ_{0} σ_{f} N_{f} \frac{\partial E (t_{f})}{\partial F_{2}}

(127)

\frac{\partial E (t_{f})}{\partial σ_{f}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial σ_{f}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial σ_{f}} = φ_{0} γ N_{f} \frac{\partial E (t_{f})}{\partial F_{2}}

(128)

\frac{\partial E (t_{f})}{\partial N_{f}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial N_{f}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial N_{f}} = φ_{0} γ σ_{f} \frac{\partial E (t_{f})}{\partial F_{2}}

(129)

\frac{\partial E (t_{f})}{\partial φ_{0}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{2}}{\partial φ_{0}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial φ_{0}} = γ σ_{f} N_{f} \frac{\partial E (t_{f})}{\partial F_{2}}

(130)

Notably, the application of the 1^st-FASAM-N requires one “large-scale” computation to solve the 1^st-LASS, cf. Eq. (117) and (118), which is a single ODE, to obtain the 1^st-level adjoint function

ω^{(1)} (t)

, which is a scalar-valued function. However, solving the forward model, cf. Eq. (17), and the corresponding 1^st-LASS, comprising Eq. (117) and (118), would require the construction of a separate (albeit simpler) NODE. The 1^st-level adjoint function

ω^{(1)} (t)

is subsequently used in performing two integrals (quadrature) for obtaining the two sensitivities of the response

E (t_{f})

with respect to the two components

F_{1} (p)

and

F_{2} (p)

of the feature function

F (p) ≜ {(F_{1}, F_{2})}^{†}

. Subsequently, all of the response sensitivities with respect to the model’s primary parameters are obtained analytically by using the chain-rule to differentiate the components of the feature function with respect to the underlying model parameters and initial conditions.

In contradistinction, if one wishes to compute directly the sensitivities of the response with respect to the model parameters and initial conditions, it has been shown in Subsections 5.1‒5.4 that the original NODE can be used to solve (backward in time) the 1^st-LASS, which comprises a system of three coupled ODEs (rather than a single ODE if the 1^st-FASAM is used) for obtaining the 1^st-level adjoint function, which is a vector-valued function comprising three components, cf.

χ^{(1)} (t) ≜ {[χ_{1}^{(1)} (t), χ_{2}^{(1)} (t), χ_{3}^{(1)} (t)]}^{†}

for the response

E (t_{f})

. The respective vector-valued 1^st-level adjoint function is subsequently used in computing six (rather than two, if the 1^st-FASAM is used) integrals (quadrature) for obtaining the six sensitivities of the respective response with respect to the six model parameters.

Equations similar to Eq. (17) can be derived for the reactor-flux and reactor temperature responses, so the 1^st-FASAM can be applied in a similar fashion to compute the first-order sensitivities of these responses. Using the sensitivities with respect to the reactor temperature response would readily provide the first-order sensitivities of the reactor thermal conductivity response. However, corresponding to each of these responses, a specific NODE would need to be constructed. Of course, any of these specific NODE would have much simpler structures than the NODE for solving simultaneously the system of coupled ODEs presented in Subsections 5.1 through 5.4.

6. Discussion and Conclusions

This work has introduced the mathematical framework of the novel “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-CASAM-NODE)” which yields exact expressions for the first-order sensitivities of NODE decoder-responses to the NODE parameters, including encoder initial conditions, while enabling the most efficient computation of these sensitivities. The application of the 1^st-CASAM-NODE has been illustrated by using the Nordheim-Fuchs reactor dynamics/safety phenomenological model, which is representative of physical systems that would be modeled by NODE while admitting exact analytical solutions for all quantities of interest (hidden states, decoder outputs, sensitivities with respect to all parameters and initial conditions, etc.). It has also been shown that if the equations underlying the physical model can be re-arranged so as to group the parameters/weights into functional “features” of several parameters, then the “First-Order Feature Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (1^st-FASAM-N)” can be advantageously applied to compute the response sensitivities with respect to the feature functions (which are by definition fewer than the number of parameters). The response sensitivities with respect to the primary parameters are subsequently obtained analytically by using the chain-rule to differentiate the components of the feature function with respect to the underlying model parameters and initial conditions. Applying the 1^st-FASAM-N, however, would require the construction of a specific NODE for this purpose.

This work has also laid the foundation for the ongoing work on conceiving the “Second-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2^nd-CASAM-NODE)” which aims at yielding exact expressions for the second-order sensitivities of NODE decoder-responses to the NODE parameters and initial conditions while enabling the most efficient computation of these sensitivities.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

Haber, E.; Ruthotto, L. Stable architectures for deep neural networks. Inverse problems, 2017, 014004.
Lu, Y; Zhong, A.; Li, Q.; Dong, B. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. International Conference on Machine Learning, PMLR, 2018, 3276–3285.
Ruthotto, L.; Haber, E. Deep neural networks motivated by partial differential equations. Journal of Mathematical Imaging and Vision, 2018, 352–364.
Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, 31, 2018, pp. 6571–6583. Curran Associates, Inc., 2018. arXiv:1806.07366v5 [cs.LG] 14 Dec 2019.
Dupont, E.; Doucet, A.; The, Y.W. Augmented neural odes. Advances in Neural Information Processing Systems, 32, 2019, 14–15.
Kidger, P.; On Neural Differential Equations, arXiv e-prints (2022), arXiv:2202.02435.
Kidger, P.; Morrill, J.; Foster, J.; Lyons, T.; Neural controlled differential equations for irregular time series, Advances in Neural Information Processing Systems, 2020, 33, 6696–6707.
Morrill, J. Salvi, C.; Kidger, P.; Foster, J. Neural rough differential equations for long time series, International Conference on Machine Learning, PMLR, 2021, 7829–7838.
Grathwohl, W. , Chen, R. T. Q., Bettencourt, J., Sutskever, I., and Duvenaud, D. Ffjord: Free-form continuous dynamics for scalable reversible generative models. International Conference on Learning Representations, 2019.
Zhong, Y. D. , Dey, B., and Chakraborty, A. Symplectic ode-net: Learning Hamiltonian dynamics with control. In International Conference on Learning Representations, 2020.
Tieleman, T.; Hinton, G. Lecture 6.5—RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. International Conference on Learning Representations, 2015.
Pontryagin, L.S. Mathematical Theory of Optimal Processes. CRC Press, Boca Raton, FL, USA, 1987.
LeCun, Y.; Touresky, D.; Hinton, G.; Sejnowski, T. A theoretical framework for back-propagation. In Connectionist Models Summer School, 1988.
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 1998, 2278–2324.
Norcliffe, A.; Deisenroth, M.P. Faster training of neural ODEs using Gauss–Legendre quadrature. Transactions on Machine Learning Research, 08/2023. Code available at: https://github.com/a-norcliffe/torch_gq_adjoint.
Lamarsh, J.R. Introduction to Nuclear Reactor Theory, Adison-Wesley Publishing Co., Reading MA, USA; 1966; pp. 491-492.
Hetrick, D. L. Dynamics of Nuclear Reactors, American Nuclear Society, Inc., La Grange Park, IL., USA, 1993; pp. 164-174.
Cacuci, D.G. Computation of high-order sensitivities of model responses to model parameters. II: Introducing the Second-Order Adjoint Sensitivity Analysis Methodology for Computing Response Sensitivities to Functions/Features of Parameters,” Energies, 16, 2023, 6356. [CrossRef]
Tukey, J.W. (1957) The Propagation of Errors, Fluctuations and Tolerances; Technical Reports No. 10–12; Princeton University. Princeton, NJ, USA, 1957.
Cacuci, D.G. The nth-Order Comprehensive Adjoint Sensitivity Analysis Methodology (nth-CASAM): Overcoming the Curse of Dimensionality in Sensitivity and Uncertainty Analysis, Volume I: Linear Systems. 2022. [Google Scholar] [CrossRef]
Cacuci, D.G. The Fourth-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (4^th-CASAM-N): I. Mathematical Framework. Journal of Nuclear Eng.
Cacuci, D.G. (1981a). Sensitivity theory for nonlinear systems: I. Nonlinear functional analysis approach. J. Math. Phys., 1981, 22, pp–2794. [Google Scholar] [CrossRef]
Cacuci, D.G. Introducing the n^th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (n^th-FASAM-N): I. Mathematical Framework,” Am. J. Comp. Math, 2024, 14, 11–42. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Downloads

Views

Comments

Subscription

Notify me about updates to this article or when a peer-reviewed version is published.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim-Fuchs Reactor Safety Model

Abstract

Keywords:

Subject:

1. Introduction

2. Neural Ordinary Differential Equations (NODE): Basic Properties and Uses

3. Illustrative Paradigm Application: NODE Conceptual Modeling of the Nordheim-Fuchs Phenomenological Reactor Dynamics/Safety Model

4. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-CASAM-NODE): Mathematical Framework

5. Illustrative Application of the 1st-CASAM-NODE Methodology to Compute First-Order Sensitivities of Nordheim-Fuchs Model Responses with respect to the Underlying Parameters

5.1. First-Order Sensitivities of the Flux Response r 1 h = φ t f

5.2. First-Order Sensitivities of the Energy Released Response r 2 h = E t f

5.3. First-Order Sensitivities of the Temperature Response r 3 h = T t f

5.4. First-Order Sensitivities of the Thermal Conductivity Response r 4 h ; φ = k ( T f ; φ )

5.5. Most Efficient Computation of First-Order Sensitivities: Application of the 1st-FASAM-N

6. Discussion and Conclusions

Funding

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe

4. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1^st-CASAM-NODE): Mathematical Framework

5. Illustrative Application of the 1^st-CASAM-NODE Methodology to Compute First-Order Sensitivities of Nordheim-Fuchs Model Responses with respect to the Underlying Parameters

5.1. First-Order Sensitivities of the Flux Response $r_{1} (h) = φ (t_{f})$

5.2. First-Order Sensitivities of the Energy Released Response $r_{2} (h) = E (t_{f})$

5.3. First-Order Sensitivities of the Temperature Response $r_{3} (h) = T (t_{f})$

5.4. First-Order Sensitivities of the Thermal Conductivity Response $r_{4} (h; φ) = k (T_{f}; φ)$

5.5. Most Efficient Computation of First-Order Sensitivities: Application of the 1^st-FASAM-N