1. Introduction
Neural Ordinary Differential Equations (NODE) provide a bridge between modern deep learning and classical mathematical/numerical modelling, while providing an explicit connection between deep feed-forward neural networks and dynamical systems. Although concepts of dynamical systems theory had been used to improve neural network performance [see, e.g., [
1,
2]], NODE-nets appear to have been formally introduced by Chen et al. [
3]. NODE provide a flexible trade-off between efficiency, memory costs and accuracy. The approximation capabilities [
4,
5] of NODE are particularly useful for modeling and controlling physical environments [see, e.g., [
6]], generative models for continuous normalizing flows [
3,
7], and time-series modelling [
3,
8,
9].
Neural ODEs are trained by minimizing a least-squares quadratic scalar-valued “loss function” which is usually meant to represent the discrepancy between a “reference solution” and the output produced by the NODE-decoder. The minimization procedure requires the computation of the gradients of the loss function with respect to the weights to be optimized. These gradients can be computed by using a first-order optimizer such as “stochastic gradient descent” [
10,
11], employing either the so-called “direct method” or the “adjoint method” [
12,
13,
14]. The one-dimensional definite integrals, which appear when computing gradients via the “adjoint method” can be efficiently evaluated by using Gauss–Legendre quadrature, which has been shown [
15] to be faster than ODE-based methods while retaining memory efficiency.
In the literature on neural nets, the “gradients of the loss function” are often called “sensitivities” and various aspects of the optimization/training procedure are occasionally called “sensitivity analysis.” However, this optimization procedure is not a bona-fide “sensitivity analysis,” since the “loss function” being minimized is of interest only for the “training” phase of the NODE-net, and the “sensitivities of the loss function” are driven towards the ideal zero-values by the minimization process while optimizing the NODE weights/parameters.
After the NODE-net is optimized to reproduce the underlying physical system as closely as possible, the subsequent responses of interest become various functionals of the NODE’s “decoder” output rather than some “loss function.” The physical system modeled by the NODE-net comprises parameters that stem from measurements and/or computations. Such parameters are not perfectly well known but are subject to uncertainties that stem from the respective experiments and/or computations. Hence, it is important to quantify the uncertainties induced in the NODE decoder output by the uncertainties that afflict the parameters/weights underlying the physical system modeled by the NODE-net. The quantification of the uncertainties in the NODE decoder and derived results (i.e., “NODE responses”) of interest require the computation of the sensitivities of the NODE decoder with respect to the optimized NODE weights/parameters. Cacuci [
16] has recently presented the “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1
st-CASAM-NODE)”, which is a pioneering sensitivity analysis methodology for computing all of the first-order sensitivities, exactly and exhaustively, of the responses of the post-training optimized NODE decoder with respect to the optimized/trained weights involved in the NODE’s decoder, hidden layers, and encoder.
It is well known that equations modeling real-life phenomena comprise not only scalar-valued model parameters but also functions of such scalar model parameters. For example, conservation equations modeling thermal-hydraulics phenomena depend on correlations, Reynolds, Nusselt numbers, etc., which are functions of various primary scalar parameters. Similarly, the neutron transport and/or diffusion equations in nuclear reactor engineering (e.g., reactor physics and shielding) involve group-averaged macroscopic cross section, which are scalar-valued functions of various sums of scalar-valued isotopic number densities and microscopic cross sections. It is convenient to refer to such scalar-valued functions as “features of primary model parameters.” Cacuci [
17] has recently introduced the “n
th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (n
th-FASAM-N),” which enables the most efficient computation of the exact expressions of arbitrarily high-order sensitivities of model responses with respect to the model’s “features.” Subsequently, the sensitivities of the responses with respect to the primary model parameters are determined, analytically and trivially, by applying the “chain-rule” to the expressions of the sensitivities with respect to the model’s “features/functions of parameters.”
The material presented in this work extends the 1
st-CASAM-NODE [
16] methodology by using the concepts underlying the n
th-FASAM-N methodology [
17] to introduce the newly developed “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1
st-FASAM-NODE)” and “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2
nd-FASAM-NODE)”. The 1
st-FASAM-NODE methodology is introduced in
Section 2 while the 2
nd-FASAM-NODE methodology is introduced in
Section 3. The discussion presented in
Section 4 concludes this work, noting that the 1
st-FASAM-NODE methodology enables the computation of exactly-determined first-order sensitivities of decoder response with respect to the NODE-parameters with unparalleled efficiency, while the 2
nd-FASAM-NODE methodology enables the computation of exactly-determined second-order sensitivities of decoder response with respect to the NODE-parameters with unparalleled efficiency. The efficiency of the 1
st-FASAM-NODE and the 2
nd-FASAM-NODE methodologies will be illustrated in the accompanying “Part II” [
18] by means of heat and energy transfer responses in the Nordheim-Fuchs phenomenological model for reactor safety [
19,
20].
2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-FASAM-NODE)
The general representation of a NODE-network that comprises “features of primary model parameters/weights” is provided by the following system of “augmented” equations:
where:
- (i)
The quantity is a time-like independent variable which parameterizes the dynamics of the hidden/latent neuron units; the initial value is denoted as (which can be considered to be an initial measurement time) while the stopping value is denoted as (which can be considered to be the next measurement time).
- (ii)
The -dimensional vector-valued function represents the hidden/latent neural networks. In this work, all vectors are considered to be column vectors and the dagger “” symbol will be used to denote “transposition.” The symbol “” will be used to denote “is defined as” or, equivalently, “is by definition equal to.”
- (iii)
The-dimensional vector-valued nonlinear function models the dynamics of the latent neurons. The components of the vector represents learnable scalar adjustable weights, where denotes the total number of adjustable weights in all of the latent neural nets. The components of the vector-valued function represent the ”feature” functions of the respective weights, which are considered to be “primary parameters;” the quantity denotes the “total number of feature/functions of the primary model parameters” comprised in the NODE. Evidently, the total number of feature functions must necessarily be smaller than the total number of primary parameters (weights), i.e., . In the extreme case when there are no feature functions, it follows that , for all .
- (iv)
The -dimensional vector-valued function represents the “encoder” which is characterized by “inputs” and “learnable” scalar adjustable weights , where denotes the total number of “inputs” and denotes the total number of “learnable encoder weights” that define the “encoder.”
- (v)
The
-dimensional vector-valued function
represents the vector of “system responses.” The vector-valued function
represents the “decoder” with learnable scalar adjustable weights, which are represented by the components of the vector
, where
denotes the total number of adjustable weights that characterize the “decoder.” Each component
can be represented in integral form as follows:
After the “training” of the NODE has been accomplished (using the “adjoint” or other methods), the various “weights” will have been assigned “optimal” values which will have minimized the user-chosen loss functional, which is usually meant to represent the discrepancy between a “reference solution” and the output produced by the NODE-decoder. These “optimal” values will be denoted using a superscript “zero” as follows:
and
. Using these optimal/nominal parameter values to solve the NODE-system will yield the optimal/nominal solution
, which will satisfy the following forms of Eqs. (1) and (2):
Furthermore, the optimal/nominal solution
will be used to obtain the components
of the vector of optimal/nominal decoder-responses; in view of Eq. (3), these components can be written in the following integral form:
However, the parameters and the initial conditions underlying the actual physical system, which is represented by the optimized NODE, are not known exactly because they are actually subject to uncertainties. The known nominal values of the weights characterizing the encoder will differ from the true but unknown values of the respective weights by variations denoted as , while the known nominal values of the initial conditions will differ from the true but unknown values of the initial conditions by variations denoted as . Similarly, the nominal values and , respectively, will differ by variations and , respectively, from the corresponding true but unknown values and . The forward state functions are related to the weights and initial conditions through Eqs. (1)‒(3); consequently, variations in these weights and initial conditions will induce corresponding variations around the nominal solution . In turn, the variations and will induce variations in the system’s response.
The 1
st-FASAM-NODE methodology for computing the first-order sensitivities of the response with respect to the model’s weights and initial conditions will be established by following the same principles as those underlying the 1
st-FASAM-N [
17] methodology. The fundamental concept for defining the sensitivity of an operator-valued quantity
with respect to variations
in a neighborhood around the nominal values
, has been shown by Cacuci [
21] to be provided by the 1
st-order Gateaux- (G-) variation
of
, which is defined as follows:
for a scalar
and for all (i.e., arbitrary) vectors
in a neighborhood
around
. The G-variation
is an operator defined on the same domain as
and has the same range as
. The G-variation
satisfies the relation:
with
. When the G-variation
is linear in the variation
, it can be written in the form
, where
denotes the first-order G-derivative of
with respect to
evaluated at
.
Applying the definition provided in Eq. (8) to Eq. (4) yields the following expression for the first-order G-variation
of the response
:
where
and where the following definitions were used:
The quantity in Eq. (10) denotes the partial G-derivatives of the response with respect to the decoder weights , evaluated at the nominal values . The quantity is called the “direct-effect term” because it arises directly from parameter variations and can be computed directly using the nominal values . The quantity is called the “indirect-effect term” because it arises indirectly, through the variations in the hidden state functions . The indirect-effect term can be quantified only after having determined the variations , which are caused by the variations , and .
The first-order relationships among the variations
,
,
and
are obtained from the first-order G-variations of Eqs. (1) and (2), which are obtained, by definition, as follows:
Carrying out the operations indicated in Eqs. (12) and (13) yields the following system of equations:
where the various matrices and vectors are defined as follows:
The system comprising Eqs. (14) and (15) is called the “1st-Level Variational Sensitivity System” (1st-LVSS), and its solution, , is called the 1st-level variational sensitivity function. The 1st-LVSS is to be satisfied at the nominal values for the respective functions and parameters, but this fact had not been indicated explicitly in order to simplify the notation. Note that the 1st-LVSS would need to be solved anew for each component of the variations , and , which would be prohibitively expensive computationally.
The need for solving the 1
st-LVSS can be avoided if the indirect-effect term defined in Eq. (11) could be expressed in terms of a “right-hand side” that does not involve the function
. This goal can be achieved by expressing the right-side of Eq. (11) in terms of the solutions of the “1
st-Level Adjoint Sensitivity System (1
st-LASS),” to be constructed next, the construction of which requires the introduction of adjoint operators. Adjoint operators can be defined in Banach spaces but are most useful in Hilbert spaces. For the NODE considered in this work, the appropriate Hilbert space will be defined on the domain
and will be denoted as
, so that
. The inner product of two vectors
and
will be denoted as
, and is defined as follows:
The scalar product is required to hold in a neighborhood of the nominal values .
The next step is to form the inner product of Eq. (14) with a vector
, where the superscript “(1)” indicates “1
st-Level”, to obtain the following relationship:
Using the definition of the adjoint operator in
, the left-side of Eq. (21) is transformed as follows, after integrating by parts over the independent variable
:
The last term on the right-side of Eq. (22) is now required to represent the “indirect-effect” term defined in Eq. (11), which is achieved by requiring that the 1
st-level adjoint function
satisfy the following relation written in NODE-format:
The definition of the 1
st-level adjoint sensitivity function
is now completed by requiring that it satisfy (adjoint) “boundary conditions at the final time”
so as to eliminate the term containing the unknown values
in Eq. (22), which is achieved by requiring that
It is evident from Eqs. (23) and (24) that to each response
, there will correspond a distinct 1
st-level adjoint sensitivity function
,
. Since it is important to highlight this characteristic, the correspondence/dependence of the 1
st-level adjoint sensitivity function to/on the individual response under consideration will be made explicit by re-writing Eqs. (23) and (24) as follows:
The system of equations comprising Eqs. (25) and (26) constitute the “1
st-Level Adjoint Sensitivity System (1
st-LASS)” for the 1
st-level adjoint function
, for the response
. Evidently, the 1
st-LASS is linear in
and is independent of parameter variations, which implies that it needs to be solved just once to obtain the 1
st-level adjoint function
. Notably, the left-side of the 1
st-LASS has the same as form as left-side of the “adjoint equations” used for training the NODE, but the “source”
on the right-side of the 1
st-LASS stems from the “response” under consideration, while the “source” on the right-side of the “adjoint equations for training the NODE” stems from “loss functional” used in training the NODE. In component form, the 1
st-LASS has the following structure for
:
Using the results represented by Eqs. (23), (24), (21), and (11) in Eq. (22) yields the following alternative expression for the “indirect-effect” term, which does not involve the 1
st-level variational sensitivity function
but involves the 1
st-level adjoint function
:
Using in Eq. (29) the expression provided for
in Eq. (15) yields the following expression for the “indirect-effect” term:
Replacing the expression obtained in Eq. (29) for the “indirect-effect term” together with the expression of the direct-effect term provided by Eq. (10) into Eq. (9) yields the following expression (which is to be evaluated at the nominal values of all functions and parameters/weights) for the first-order G-variation
of the response
:
As indicated by the right-side of Eq. (31), the sensitivities of the response
are provided by the following expressions, all of which are to be evaluated at the nominal values of all functions and parameters/weights:
The expressions of the first-order sensitivities obtained in Eqs. (32)‒(35) correspond to the
nth-response,
, where
. The first-order sensitivities of the response with respect to the primary parameters/weights
are obtained analytically by applying the “chain-rule” to the expression in Eq. (32), which yields:
When there are no feature functions, it follows that , for all , so the expression obtained in Eq. (32) yields directly the first-order sensitivities , for all .
3. Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2nd-FASAM-NODE)
The second-order sensitivities of the response defined in Eq. (4) are obtained by computing the first-order sensitivities of the expressions obtained in Eqs. (32)‒(35), which represent the first-order sensitivities of the response with respect to the various parameters (weights) and initial conditions (input). In other words, the second-order sensitivities of will be computed by conceptually using their basic definitions as being the “first-order sensitivities of the first-order sensitivities.”
3.1. Second-Order Sensitivities Stemming from , ;
The second-order sensitivities stemming from the first-order sensitivities
are obtained from the first-order G-differential of Eq. (32), for
;
, as follows:
In Eq. (37), the direct-effect term
comprises the variation
(stemming from the model parameters) and is defined as follows:
while the indirect-effect term
comprises the variations
and
, and is defined as follows:
The expressions in Eqs. (38) and (39) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (namely: the superscript “zero”) has been omitted in order to simplify the notation.
The direct-effect term can be evaluated at this time for all variations
, but the indirect-effect term can be evaluated only after having determined the variations
and
. The variation
is the solution of the 1
st-LVSS defined by Eqs. (14) and (15). On the other hand, the variational function
is the solution of the system of equations obtained by G-differentiating the 1
st-LASS. This differentiation is performed most transparently by applying the definition of the G-differential to the 1
st-LASS presented in component form in Eqs. (27) and (28), to obtain the following system of equations for
:
Performing the operations indicated in Eqs. (40) and (41) yields the following relations for
:
The summations in Eq. (42) are rearranged as follows, for each
, and
:
where the following definitions were used:
In matrix-format, Eq. (44) can be written as follows:
where the following definitions were used:
Concatenating Eqs. (14) and (46) yields the following 2
nd-Level Variational Sensitivity System (2
nd-LVSS) for the 2
nd-level variational function
:
where the superscript “2” indicates “second-level” and where the following definitions were used:
The matrix “
” in Eq. (49) denotes a
matrix having zero-elements. The initial-time and final-time conditions satisfied by
are provided in Eqs. (15) and (43), which can be written in vector form as follows:
It is impractical to solve the 2
nd-LVSS repeatedly in order to compute each 2
nd-level variational function
that would be needed for every component of the variations
and
. The need for computing
repeatedly can be circumvented by applying the principles of the 2
nd-FASAM-N methodology [
17] which comprises the following sequence of steps:
Consider that
is an element in a Hilbert space denoted as
,
, comprising as elements 2-block vectors having the following structure:
, with
and
. The Hilbert space
is considered to be endowed with an inner product denoted as
, between two vectors
and
, with
,
, which is defined as follows:
The scalar product is required to hold in a neighborhood of the nominal values .
- 2.
Use the definition of the inner product provided in Eq. (51) to form the inner product of Eq. (48) with a vector
, with
and
, where the superscript “(2)” indicates “2
nd-level”, to obtain the following relationship:
- 3.
Using the definition of the adjoint operator in
, the left-side of Eq. (52) is transformed as follows, after integrating by parts over the independent variable
:
- 4.
The two integral terms on the right-side of Eq. (53) are now required to represent the “indirect-effect” term defined in Eq. (39), which is achieved by imposing the following requirements:
Note that the right-sides of Eqs. (54) and (55) also depend on the index , which means that a distinct 2nd-level adjoint sensitivity function will correspond to each value of the index . This fact has been explicitly indicated by including the index in the list of arguments of the components of the 2nd-level adjoint sensitivity function .
- 5.
The definition of the 2
nd-level adjoint sensitivity function
is now completed by requiring it to satisfy the following boundary conditions, which eliminates the respective unknown terms on the right-side of Eq. (53)
The system of equations comprising Eqs. (54)‒(56) constitutes the “2nd-Level Adjoint Sensitivity System (2nd-LASS)” for the 2nd-level adjoint sensitivity function . Evidently, the 2nd-LASS is linear in and is independent of parameter variations. Notably, this system of equations does not need to be solved simultaneously, but can be solved sequentially, by first solving Eq. (55) subject to the initial condition given in Eq. (56) to determine the function , and subsequently using the function in Eq. (54) to solve it subject to the “final-time” condition given in Eq. (56) to obtain the function . The 2nd-LASS is to be solved using the nominal values of all functions and parameters/weights, but the superscript “zero” has been omitted for notational simplicity.
- 6.
Using the results obtained in Eqs. (52), (54)‒(56) in Eq. (53) yields the following alternative expression for the “indirect-effect” term
, which no longer involves the 2
nd-level variational sensitivity function
but involves the 2
nd-level adjoint sensitivity function
:
- 7.
Using in Eq. (57) the expression provided for
in Eq. (15), and adding the resulting expression for the indirect-effect term
to the expression for the direct-effect term
provided in Eq. (38), yields the following expression for the total first-order G-differential
, for each
:
The expression shown in Eq. (58) is to be evaluated at the nominal values of all functions and parameters/weights but the superscript “zero” (which has been used to indicate this fact) has been omitted for notational simplicity.
Identifying in Eq. (58) the expressions that multiply the individual variations
,
,
and
yields the following expressions for the second-order sensitivities that stem from the first-order sensitivity
for
:
The second-order sensitivities of the responses with respect to the primary parameters/weights
are obtained analytically by using the results obtained in Eqs. (59)‒(62) in conjunctions with the “chain-rule”:
When there are no feature functions, the second-order sensitivities are obtained by setting , for all in Eqs. (59)‒(62).
3.2. Second-Order Sensitivities Stemming from ,;
The second-order sensitivities stemming from the first-order sensitivities
are obtained from the first-order G-differential of Eq. (33), for
;
, as follows:
where the direct-effect term
comprises the variation
and is defined as follows:
and where the indirect-effect term
comprises the variation
and is defined as follows:
The expressions shown in Eqs. (65) and (66) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (namely: the superscript “zero”) has been omitted in order to simplify the notation.
The direct-effect term
can be evaluated at this time for all variations
, but the indirect-effect term
can be evaluated only after having determined the variation
, which is the solution of the 1
st-LVSS defined by Eqs. (14) and (15). The need for solving this 1
st-LVSS can be avoided if the indirect-effect term defined in Eq. (66) could be expressed in terms of a “right-hand side” that does not involve the function
. This goal can be achieved by applying the same concepts and steps as in
Section 2, as follows:
Using Eq. (20), form the inner product of Eq. (14) with a vector
, where the superscript “(2)” indicates “2
nd-Level”, to obtain the following relationship:
Using the definition of the adjoint operator in
, the left-side of Eq. (21) is integrated by parts over the independent variable
to obtain the following relation:
The last term on the right-side of Eq. (68) is now required to represent the “indirect-effect” term defined in Eq. (66), which is achieved by requiring that the 2
nd-level adjoint sensitivity function
satisfy the following relation written in NODE-format:
It is evident from Eq. (69) that to each response , there will correspond a distinct 2nd-level adjoint sensitivity function , for each and . This important fact has been highlighted by adding the indices and in the list of arguments of .
- 4.
The definition of the 2
nd-level adjoint sensitivity function
is now completed by requiring it to eliminate the term containing the unknown values
in Eq. (68), which is accomplished by requiring
to satisfy the following boundary condition at the final time
:
The system of equations comprising Eqs. (69) and (70) constitute the “2nd-Level Adjoint Sensitivity System (2nd-LASS)” for the function ; this 2nd-LASS is linear in and is independent of parameter variations, which implies that it needs to be solved once to obtain the 2nd-level adjoint sensitivity function , for each , . Notably, the left-side of this 2nd-LASS has the same as form as left-side of the “adjoint equations” used for training the NODE, but the sources on the right-side of this 2nd-LASS stem from each of the “responses” under consideration, while the “source” on the right-side of the “adjoint equations for training the NODE” stems from the “loss functional” used in training the NODE.
- 5.
Using the results obtained in Eqs. (69), (70), and (67) in Eq. (68) yields the following alternative expression for the “indirect-effect” term, which does not involve the 1
st-level variational sensitivity function
but instead involves the 2
nd level adjoint function
:
- 6.
Adding the expression obtained in Eq. (71) for the “indirect-effect term” together with the expression of the direct-effect term provided by Eq. (65) yields the following expression for the G-variation
defined in Eq. (64), which is to be evaluated at the nominal values of all functions and parameters/weights, for
:
As indicated by the rightmost-side of Eq. (72), the sensitivities of the response
are provided by the following expressions, all of which are to be evaluated at the nominal values of all functions and parameters/weights:
The first-order sensitivities of the response with respect to the primary parameters/weights are obtained analytically by applying the “chain-rule” to the expression in Eq. (73). When there are no feature functions, then , for all .
3.3. Second-Order Sensitivities Stemming from , ;
The second-order sensitivities stemming from the first-order sensitivities
are obtained from the first-order G-differential of Eq. (34), for
;
, as follows:
where the direct-effect term
comprises the variations
and
, defined as follows:
and where the indirect-effect term
comprises the variation
and is defined as follows:
The expressions shown in Eqs. (78) and (79) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (namely: the superscript “zero”) has been omitted in order to simplify the notation.
The direct-effect term
can be evaluated at this time for all variations
and
, but the indirect-effect term
can be evaluated only after having determined the variation
, which is the solution of the 2
nd-LVSS defined by Eqs. (48) and (50). The need for solving this 1
st-LVSS can be avoided if the indirect-effect term defined in Eq. (79) could be expressed in terms of a “right-hand side” that does not involve the function
. This goal can be achieved by applying the same concepts and steps as in
Section 3.1, as follows:
Use the definition of the inner product provided in Eq. (51) to form the inner product of Eq. (48) with a vector
, with
and
, where the superscript “(2)” indicates “2
nd-level”, to obtain the following relationship:
Using the definition of the adjoint operator in
, the left-side of Eq. (80) is transformed as follows, after integrating by parts over the independent variable
:
The two integral terms on the right-side of Eq. (81) are now required to represent the “indirect-effect” term defined in Eq. (79), which is achieved by imposing the following requirements on the components of the 2
nd-level adjoint sensitivity function
:
Since the right-side of Eq. (83) also depends on the index , it follows that a distinct 2nd-level adjoint sensitivity function will correspond to each distinct value of the index . This fact has been explicitly indicated by including the index in the list of arguments of the components of the 2nd-level adjoint sensitivity function :
- 4.
The definition of the 2
nd-level adjoint sensitivity function
is now completed by requiring that it satisfy the following boundary conditions, which eliminates the respective unknown terms on the right-side of Eq. (53):
The system of equations comprising Eqs. (82)‒(84) constitutes the “2nd-Level Adjoint Sensitivity System (2nd-LASS)” for the 2nd-level adjoint sensitivity function . This 2nd-LASS is independent of parameter variations, is linear in , and can be solved sequentially, by first solving Eq. (83) subject to the initial condition given in Eq. (84) to determine the function , and subsequently using the function in Eq. (82) to solve it subject to the “final-time” condition given in Eq. (84) to obtain the function . The 2nd-LASS is to be solved using the nominal values of all functions and parameters/weights, but the superscript “zero” has been omitted for notational simplicity.
- 5.
Using the results obtained in Eqs. (80), (82)‒(84) in Eq. (81) yields the following alternative expression for the “indirect-effect” term
, which no longer involves the 2
nd-level variational sensitivity function
but involves the 2
nd-level adjoint sensitivity function
:
- 6.
Adding the expression obtained in Eq. (85) to the expression for the direct-effect term
provided in Eq. (78) yields the following expression for the total first-order G-differential
, for each
:
The expression shown in Eq. (86) is to be evaluated at the nominal values of all functions and parameters/weights but the superscript “zero” (which has been used to indicate this fact) has been omitted for notational simplicity.
Identifying in Eq. (86) the expressions that multiply the individual variations
,
,
and
yields the following expressions for the second-order sensitivities that stem from the first-order sensitivity
for
:
The second-order sensitivities of the responses with respect to the primary parameters/weights are obtained analytically by using the results obtained in Eqs. (87)‒(90) in conjunctions with the “chain-rule” to the expression in Eq. (87). In the absence of feature functions, the second-order sensitivities are obtained by setting , for all .
3.4. Second-Order Sensitivities Stemming from , ;
The second-order sensitivities stemming from the first-order sensitivities
are obtained from the first-order G-differential of Eq. (35), for
;
, as follows:
where the direct-effect term
comprises the variations
and
, defined as follows:
and where the indirect-effect term
comprises the variation
and is defined as follows:
The expressions shown in Eqs. (78) and (79) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (namely: the superscript “zero”) has been omitted in order to simplify the notation.
The direct-effect term
can be evaluated at this time for all variations
and
, but the indirect-effect term
can be evaluated only after having determined the variation
, which is the solution of the 2
nd-LVSS defined by Eqs. (48) and (50). The need for solving this 2
nd-LVSS can be avoided if the indirect-effect term defined in Eq. (93) could be expressed in terms of a “right-hand side” that does not involve the function
. This goal can be achieved by applying the same concepts and steps as in
Section 3.3, which will not be shown in detail here in order to minimize repetitive derivations.
The final expressions of the second-order sensitivities stemming from
,
;
are as follows:
The 2
nd-level adjoint sensitivity function
, with
and
, which appears in Eqs. (94)‒(97) is the solution of the following 2
nd-Level Adjoint Sensitivity System (2
nd-LASS):
3.5. Discussion: Double-Computation of the Mixed Second-Order Sensitivities
It is evident from the derivations presented in Subsections 3.1 through 3.4 that the unmixed second-order sensitivities are computed just once, but the mixed second-order sensitivities are obtained/computed twice, having obtained two distinct expressions involving distinct 2nd-level adjoint sensitivity functions for each mixed second-order sensitivity. The specific distinct expressions for the respective mixed second-order sensitivities are as follows:
The mixed second-order sensitivities are obtained in Eq. (60) in terms of the 2nd-level adjoint sensitivity functions ,. On the other hand, the mixed second-order sensitivities are obtained in Eq. (94) in terms of the 2nd-level adjoint sensitivity functions and , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (60), on the one hand, and using Eq. (94), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities are obtained in Eq. (61) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (87) in terms of the 2nd-level adjoint sensitivity functions and , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (61), on the one hand, and using Eq. (87), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities are obtained in Eq. (62) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (73) in terms of the 2nd-level adjoint sensitivity functions , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (62), on the one hand, and using Eq. (73), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities are obtained in Eq. (88) in terms of the adjoint sensitivity functions and , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (96) in terms of the adjoint sensitivity functions and , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (74), on the one hand, and using Eq. (97), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the adjoint functions involved in the respective computations.
The mixed second-order sensitivities are obtained in Eq. (74) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (97) in terms of the 2nd-level adjoint sensitivity functions , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (74), on the one hand, and using Eq. (97), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
The mixed second-order sensitivities are obtained in Eq. (75) in terms of the 2nd-level adjoint sensitivity functions , for . On the other hand, the mixed second-order sensitivities are obtained in Eq. (90) in terms of the 2nd-level adjoint sensitivity functions , for . Due to the symmetry property of the mixed second-order sensitivities, the numerical results obtained for the corresponding mixed second-order sensitivities by computing them using Eq. (75), on the one hand, and using Eq. (90), on the other hand, provides a verification mechanism for assessing the accuracy of the computation of the 2nd-level adjoint functions involved in the respective computations.
4. Discussion and Conclusions
As has been discussed in this work, after the NODE-net is optimized to reproduce the underlying physical system as closely as possible, the subsequent NODE-responses of interest are various functionals of the NODE’s “decoder” output rather than some “loss function.” The physical system modeled by the NODE-net comprises parameters that stem from measurements and/or computations. Such parameters are not perfectly well known but are subject to uncertainties that stem from the respective experiments and/or computations. These uncertainties will induce uncertainties in the NODE decoder’s output/response, which can only be quantified if the sensitivities of the NODE decoder’s response with respect to the optimized NODE weights/parameters are available.
It is well known that equations modeling real-life phenomena comprise not only scalar-valued model parameters but also functions of such scalar model parameters. For example, conservation equations modeling thermal-hydraulics phenomena depend on correlations, Reynolds, Nusselt numbers, etc. Similarly, neutron transport and/or diffusion equations in nuclear reactor engineering (e.g., reactor physics and shielding) involve group-averaged macroscopic cross section, which are scalar-valued functions of various sums of scalar-valued isotopic number densities and microscopic cross sections. Such scalar-valued functions are conveniently called “features of primary model parameters.” This work has introduced the mathematical framework of the “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-FASAM-NODE)”, which derives and computes most efficiently the exact expressions of all of the first-order sensitivities of NODE-decoder responses with respect to the features of parameters (weights, initial conditions, etc.) that characterize the NODE’s decoder, hidden layers, and encoder. It has been shown that a single large-scale computation to solve the 1st-Order Adjoint Sensitivity System(1st-LASS) suffices to obtain the first-order sensitivities of a decoder-response with respect to all of the feature functions of parameters. Subsequently, the sensitivities of the responses with respect to the primary model parameters are determined, analytically and trivially, by applying the “chain-rule” to the expressions of the sensitivities with respect to the model’s “features/functions of parameters.”
This work has also presented the “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2nd-FASAM-NODE),” which builds upon the 1st-FASAM-NODE methodology to derive and compute most efficiently the exact expressions of all of the second-order sensitivities of NODE-decoder responses with respect to the features of parameters (weights, initial conditions, etc.) that characterize the NODE’s decoder, hidden layers, and encoder. It has been shown that, for each decoder response, the computation of all of the second-order sensitivities requires as many large-scale computations ( to solve the 2nd-Level Adjoint Sensitivity System) as there are first-order sensitivities of the decoder-response with respect to the feature functions, initial conditions, and decoder-weights. The 2nd-FASAM-NODE methodology computes the mixed second-order sensitivities twice, involving distinct 2nd-level adjoint sensitivity functions in distinct expressions.
The concepts underlying the 1
st-FASAM-NODE, and the 2
nd-FASAM-NODE methodologies will be illustrated in the accompanying “Part II” [
18] by considering the energy and heat transfer processes described by the well-known Nordheim-Fuchs phenomenological model [
19,
20] of reactor safety.