1. Introduction
It is well known that Neural Ordinary Differential Equations (NODEs) have enabled the use of deep learning for modeling discretely sampled dynamical systems. NODEs [
1,
2,
3] provide a flexible trade-off between efficiency, memory costs and accuracy while bridging traditional numerical modeling with modern deep learning, as demonstrated by various applications, including time-series, dynamics and control [
1,
2,
3,
4,
5,
6,
7,
8,
9]. However, since each time-step is determined locally in time, NODEs are limited to describing systems that are instantaneous. On the other hand, integral equations (IE) model global “long-distance” spatiotemporal relations, and IE solvers often possess stability properties that are superior to solvers for ordinary and/or partial differential equations. Therefore, differential equations are occasionally recast in integral-equation forms that can be solved more efficiently using IE solvers, as exemplified by the applications described in [
10,
11,
12].
Due to their non-local behavior, IE solvers are suitable for modeling complex dynamics, learning the operator underlying the system under consideration by using data sampled from the respective system. As discussed in [
13], the operator learning problem is formulated on finite grids, using finite-difference methods that approximate the domain of the functions under investigation; the learning is performed by using an IE solver which samples the domain of integration continuously. As shown in [
14], Neural Integral Equations (NIEs) and the Attentional Neural Integral Equations (ANIEs) can be used to generate dynamics and infer the spatiotemporal relations that initially generated the data, thus enabling the continuous learning of non-local dynamics with arbitrary time resolution. The ANIE interprets the self-attention mechanism as the Nystrom method for approximating integrals [
15], which enables efficient integration over higher dimensions, as discussed in [
10,
11,
12,
13,
14,
15] and references therein.
Neural nets are trained by minimizing a “loss functional” chosen by the user to represent the discrepancy between the output produced by the neural net’s decoder and some user-chosen “reference solution.” However, the physical system modeled by a neural net inevitably comprises imperfectly known parameters that stem from measurements and/or computations and are therefore afflicted by uncertainties that stem from the respective experiments and/or computations. Hence, even if the neural net reproduces perfectly a given state of a physical system, the neural net’s “optimized weights” are subject to the uncertainties inherent in the parameters that characterize the underlying physical system, and these uncertainties inevitably propagate to the decoder’s output response. It is hence important to quantify the impact of parameters/weights uncertainties on the uncertainties induced in the decoder’s output response. This impact is quantified by the sensitivities of the decoder’s response with respect to the optimized weights/parameters comprised within the neural net.
Neural nets comprise not only scalar-valued weights/parameters but also functions (e.g., correlations) of such scalar model parameters, which can be conveniently called “features of primary model parameters”. Cacuci [
16] has developed the “n
th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (n
th-FASAM-N)”, which enables the most efficient computation of the exact expressions of arbitrarily high-order sensitivities of model responses with respect to the model’s “features”. In turn, the sensitivities of the responses with respect to the primary model parameters are determined, analytically and trivially, by applying the “chain rule” to the expressions obtained for the response sensitivities with respect to the model’s “features.” The n
th-FASAM-N [
16] has been applied to develop general first- and second-order sensitivity analysis methodologies for NODEs [
17] and for Neural Integral Equations of Fredholm-type [
18], which enable the computation, with unsurpassed efficacy, of the exact expressions of first and second-order sensitivities of decoder responses with respect to the underlying neural net’s optimized weights.
This work continues the application of the n
th-FASAM-N [
16] methodology to develop the “First- and Second-Order Methodologies for Neural Integral Equations of Volterra Type” (acronyms “1
st-FASAM-NIE-V” and, respectively, “2
nd-FASAM-NIE-V”). The 1
st-FASAM-NIE-V methodology, which is presented in
Section 2, enables the most efficient computation of exact expressions of all of the first-order sensitivities of NIE decoder responses with respect to all of the optimal values of the NIE-net’s parameters/weights, after the respective NIE-Volterra-net was optimized to represent the underlying physical system. The efficiency of the 1
st-FASAM-NIE-V is illustrated in
Section 3 by applying it to perform a comprehensive first-order sensitivity analysis of the well-known model [
19,
20,
21] of neutron slowing down in a homogeneous medium containing fissionable material.
The general mathematical framework of the 2
nd-FASAM-NIE-Volterra methodology, which is presented in
Section 4, enables the most efficient computation of the exact expressions of the second-order sensitivities of NIE decoder responses with respect to all of the optimal values of the NIE-net’s parameters/weights. The efficiency of the 2
nd-FASAM-NIE-V is illustrated in
Section 5 by applying it to perform a comprehensive second-order sensitivity analysis of the neutron slowing down model [
19,
20,
21] considered in
Section 3.
Section 6 concludes this work by presenting a discussion that highlights the unparalleled efficiency of the 2
nd-FASAM-NIE-V methodology for performing sensitivity analysis of Volterra-type neural integral equations.
2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of Volterra-Type (1st-FASAM-NIE-V)
Following [
14], a network of nonlinear “Neural Integral Equations of Volterra-type (NIE-Volterra)” can be represented by the system of coupled equations shown below:
The quantities appearing in Eq. (1) are defined as follows:
- (i)
The real-valued scalar quantities , , and , , are time-like independent variables which parameterize the dynamics of the hidden/latent neuron units. Customarily, the variable is called the “global time” while the variable is called the “local time.” The initial time-value is denoted as while the stopping time-value is denoted as .
- (ii)
The components of the vector represent scalar learnable adjustable weights, where denotes the total number of adjustable weights in all of the latent neural nets. The components of the column-vector are considered to be “primary parameters” while the components of the vector-valued function represent the ”feature” functions of the respective weights. The quantity denotes the “total number of feature/functions of the primary model parameters” comprised in the NIE-Volterra. In general, is a nonlinear function of . The total number of feature functions must necessarily be smaller than the total number of primary parameters (weights), i.e., . In the extreme case when there are no feature functions, it follows that , for all . In this work, all vectors are considered to be column vectors, and the dagger “” symbol will be used to denote “transposition.”. The symbol “” will be used to denote “is defined as” or, equivalently, “is by definition equal to.”
- (iii)
The -dimensional vector-valued function represents the hidden/latent neural networks. The quantity denotes the total number components of . At the initial time-value , the functions take on the known values .
- (iv)
The functions , , model the initial state (“encoder”) of the network. The functions and , , depend nonlinearly on and , respectively, and model the dynamics of the latent neurons.
The “training” of the NIE-Volterra net is accomplished by using the “adjoint” or other methods to minimize the user-chosen “loss functional” intended to represent the discrepancy between the output produced by the NIE-decoder and a “reference solution” chosen by the user. After the training is completed, the primary parameters (“weights”)
will have been assigned “optimal” values which are obtained as a result of having minimized the chosen loss functional. These optimal values for the primary parameters (“weights”) will be denoted using a superscript “zero,” as follows:
. Using these optimal/nominal parameter values to solve the NIE-system will yield the optimal/nominal solution
, which will satisfy the following form of Eq. (1):
After the NIE-net is optimized to reproduce the underlying physical system as closely as possible, the subsequent responses of interest are no longer “loss functions” but become specific functionals of NIE’s “decoder” response/output. Such a decoder-response, which will be denoted as
, can be generally represented a scalar-valued functional of
and
, defined as follows:
The function models the decoder and may contain distributions (e.g., Dirac-delta and/or Heaviside functionals, etc.), if the decoder-response is to be evaluated at some particular point in time or over a subinterval within the interval .
The optimal value of the decoder-response, denoted as
, is represented by evaluating Eq. (3) at the optimal/nominal parameter values
and optimal/nominal solution
, as follows:
The true values of the primary parameters (“weights”) that characterize the physical system modeled by the NIE-V net are afflicted by uncertainties inherent to the experimental and/or computational methodologies employed to model the original physical system. Therefore, the true values of the primary parameters (“weights”) will differ from the known nominal values (which are obtained after training the NIE-net to represent the model of the physical system) by variations denoted as . The variations will induce corresponding variations , , in the feature functions, which in turn will induce variations , , , around the nominal/optimal functions . Subsequently, the variations and will induce variations in the NIE decoder’s response.
The 1
st-FASAM-IDE-V methodology for computing the first-order sensitivities of the decoder’s response with respect to the NIE’s weights will be established by applying the same principles as those underlying the 1
st-FASAM-N [
16] methodology. These first-order sensitivities are embodied in the first-order G-variation
of the response
, for variations
and
around the nominal values
and
, which is by definition obtained as follows:
In Eq. (5), the “direct-effect term”
arises directly from variations
(which in turn stem from parameter variations
) and is defined as follows:
while the “indirect-effect term”
arises through the variations
in the hidden state functions
, and is defined as follows:
The first-order relationship between the variations
and
is obtained from the first-order G-variation of Eq. (1) for
, as follows:
Performing the operations indicated in Eq. (8) yields the following NIE-V net, which will be called the “1
st-Level Variational Sensitivity System” (1
st-LVSS), for the components
,
, of the “1
st-level variational function”
:
where:
As indicated in Eq. (9), the 1st-LVSS is to be computed at the nominal/optimal values for the respective model parameters. It is important to note that the 1st-LVSS is linear in the variational function , although it generally remains nonlinear in .
The 1st-LVSS would need to be solved anew to obtain the function that would correspond to each variation , ; this procedure would become prohibitively expensive computationally if is a large number. The need for repeatedly solving the 1st-LVSS can be avoided by recasting the indirect-effect term in terms of an expression that does not involve the function . This goal can be achieved by expressing in terms of another function, which will be called the “1st-level adjoint function,” and which will be the solution of the “1st-Level Adjoint Sensitivity System (1st-LASS)” to be constructed next.
The 1
st-LASS will be constructed in a Hilbert space, denoted as
, where
, comprising elements of the same form as
. The inner product of two elements
and
will be denoted as
and is defined as follows:
The inner product is required to hold in a neighborhood of the nominal values .
The next step is to form the inner product of Eq. (9) with a vector
, where the superscript “(1)” indicates “1
st-level”, to obtain the following relationship:
The second term on the left-side of Eq. (12) is transformed using “integration by parts” as follows:
Replacing the result obtained in Eq. (13) into the left-side of Eq. (12) yields the following relation for the left-side of Eq. (12):
The term on the right-side of Eq. (14) is now required to represent the “indirect-effect” term defined in Eq. (7), which is achieved by requiring that the components of the function
satisfy the following system of equations for
:
The Volterra-like neural system obtained in Eq. (15) will be called the “1st-Level Adjoint Sensitivity System” and its solution, , will be called the “1st-level adjoint sensitivity function.” The 1st-LASS is to be solved using the nominal/optimal values for the parameters and for the function but this fact has not been explicitly indicated in order to simplify the notation. The 1st-LASS is linear in but is, in general, nonlinear in . Notably, the 1st-LASS is independent of any parameter variations and needs to be solved once only to determine the 1st-level adjoint sensitivity function . The 1st-LASS is a “final-value problem” since the computation of the adjoint function will commence at , with the known values
It follows from Eqs. (12)‒(15) that the indirect-effect term defined in Eq. (7) can be expressed in terms of the 1
st-level adjoint sensitivity function
as follows:
Using the results obtained in Eqs. (16) and (6) in Eq. (5) yields the following expression for the G-variation
, which is seen to be linear in
:
Identifying in Eq. (17) the expressions that multiply the variations
, yields the following expressions for the sensitivities
of the response
with respect to the components
of the feature function
, for
:
The expression on the right-side of Eq. (18) is to be evaluated at the nominal/optimal values for the respective model parameters, but this fact has not been indicated explicitly in order to simplify the notation.
The sensitivities with respect to the primary model parameters can be obtained by using the result obtained in Eq. (18) together with the “chain rule” of differentiating compound functions, as follows:
The sensitivities are obtained from Eq. (18) while the derivatives are obtained analytically, exactly, from the known expressions of the feature functions .
Particular Case: The First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of Volterra-Type (1st-CASAM-NIE-V)
When no feature functions can be constructed from the model parameters/weights, the feature functions become identical to the parameters, i.e.,
for all
. In this case, the expression obtained in Eq. (18) yields directly the first-order sensitivities
of the decoder response with respect to the model weights/parameters, for all
, taking on the following specific form:
Since the 1
st-LASS is independent of any parameter variations, the 1
st-level adjoint sensitivity function
which appears in Eq. (20) remains the solution of the 1
st-LASS defined by Eq. (15). In this case, however, all of the sensitivities
, for all
would be obtained by computing integrals using quadrature formulas. Thus, when there are no feature functions of parameters, the 1
st-FASAM-NIE-V reduces to the “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology [
16] applied to Neural Integral Equations of Volterra-Type” (1
st-CASAM-NIE-V). On the other hand, when features of parameters can be constructed, only
numerical computations of integrals using quadrature formulas are required, using Eq. (18) to obtain the sensitivities
,
. Subsequently, the sensitivities with respect to the model’s weights/parameters are obtained analytically using the chain-rule provided in Eq. (19).
3. Illustrative Application of the 1st-CASAM-NIE-V and 1st-FASAM-NIE-V Methodologies to Neutron Slowing Down in an Infinite Homogeneous Hydrogenous Medium
The illustrative model considered in this Section is a Volterra-type integral equation that describes the energy distribution of neutrons in a homogeneous hydrogenous medium (such as a water-moderated/cooled reactor system) containing
238U (among other materials), which is a heavy element that strongly absorbs neutrons. The distribution of collided neutrons in such a medium is described [
19,
20,
21] by the following linear integral equation of Volterra-type, customarily called the “neutron slowing down equation” for the neutron collision density denoted as
:
The various quantities that appear in Eq. (21) are defined as follows:
- (i)
The quantity denotes the rate at which the source neutrons, considered to be monoenergetic, are emitted at the “source energy” . Neutron upscattering is considered to be negligeable; therefore, is the highest energy in the medium.
- (ii)
The quantity , , denotes the instantaneous energy of the collided neutrons; denotes the lowest neutron energy in the model.
- (iii)
The quantity
denotes the medium’s macroscopic scattering cross section, which is defined as follows:
where
M denotes the number of materials in the medium,
denotes the relative weighting of the i
th-material in the medium,
denotes the number density of the i
th-material, while
denotes the energy-dependent scattering microscopic cross section of the i
th-material.
- (iv)
The quantity
denotes the medium’s macroscopic scattering cross section, which is defined as follows:
where
denotes the energy-dependent total microscopic cross section of the i
th-material. The quantities
,
,
,
are subject to uncertainties since they are determined from experimentally obtained data.
Notably, the Volterra-type Eq. (21) is a “final-value problem” since the computation is started at the highest-energy value,
, and progresses towards the lowest energy value
. Customarily, the solution of Eq. (21) is written in the following form:
where
denotes the medium’s macroscopic absorption cross section. The expression provided in Eq. (24) is amenable to computations of the loss of neutrons due to absorbing materials, particularly in the so-called “resonance” energy region.
A typical “decoder response” for the NIE-Volterra network modeled by Eq. (21) is the energy-averaged collision density, denoted below as
, which would be measured by a detector having an interaction cross-section
. Mathematically, this detector-response can be expressed as follows:
where
and
denote, respectively, the detector material’s atomic number density and the microscopic cross section describing the interaction (e.g., absorption) of neutrons with the detector’s material;
and
can be considered as the “weights” that characterizes the neural net’s “decoder.”
Since the energy-dependence of the cross sections does not play a significant role in the sensitivity analysis of the NIE-Volterra modeled by Eq. (21), the respective microscopic cross-sections will henceforth be considered to be energy-independent for the purpose of illustrating the application of the 1
st-FASAM-NIE-V, in order to simplify the ensuing derivations. For energy-independent cross sections, Eqs. (21) and (25) take on the following forms, respectively:
In Eqs. (26) and (27), the source strength
is an imprecisely-known “weight” that characterizes the neural net’s “encoder.” Furthermore, the (column) vector of parameters denoted as
comprises as components the “imprecisely known primary model parameters” (or “weights”, as customarily called when referring to neural nets) and is defined as follows:
where
denotes the “total number of imprecisely-known weights/parameters.” These primary model parameters/weights are not known exactly but are affected by uncertainties since they stem from experimental procedures, which determine the nominal/mean/optimal values and the second-order moments of their unknown joint distributions; their third- and higher-order moments are rarely known. It is convenient to denote the nominal values of these primary model parameters/weights by using the superscript “zero” as follows:
The “feature function of primary parameters,”
, is defined as follows:
The closed-form solution of Eq. (26) has the following expression in terms of the feature function
:
The closed-form expression of the decoder response can be readily obtained by replacing the result obtained in Eq. (31) into Eq. (27) and performing the integration over the energy-variable to obtain:
The expression obtained in Eq. (32) reveals that the imprecisely known quantities that affect the decoder-response are as follows:
- (i)
the source strength ;
- (ii)
the detector interaction macroscopic cross section , which can be considered to be a “feature function” of the model parameters ;
- (iii)
the feature function .
3.1. Application of 1st-CASAM-NIE-V to Directly Compute the First-Order Sensitivities of the Decoder Response with Respect to the Primary Model Parameters
The first-order sensitivities of the decoder response with respect to the model parameters is obtained by applying the definition of the G-differential to Eq. (26), for arbitrary parameter variations around the parameters’ nominal values. These parameter variations will induce variations in the neutron collision density, around the nominal value of the neutron collision density. The variations and will induce variations in the decoder’s response.
The first-order Gateaux (G-)variation
is obtained, by definition, from Eq. (27) as follows:
where the “direct effect” term
arises directly from parameter variations
and is defined as follows:
while the indirect effect term arises from the variations
and is defined as follows:
As indicated in Eqs. (34) and (35), both the direct-effect and the indirect-effect term are to be evaluated at the nominal parameter values.
The first-order relation between the variation
and the parameter variations
is obtained by evaluating the G-variation of Eq. (26) for variations
around the nominal parameter values
which yields, by definition, the following NIE-Volterra equation for
:
where:
The second equality in Eq. (37) has been obtained by using Eqs. (26) and (31) to eliminate the integral term involving .
The particular form of the first-order derivative
, which appears in Eq. (37), is obtained by using the definition of
provided in Eq. (30), which yields the following expression:
In view of the definition provided in Eq. (22), the derivatives
have the following particular expressions:
In view of the definition provided in Eq. (23), the derivatives
have the following particular expressions:
The NIE-Volterra net represented by Eq. (36) will be called the “1
st-Level Variational Sensitivity System (1
st-LVSS)” and its solution,
, will be called the “1
st-level variational sensitivity function.” It is evident that Eq. (36) would need to be solved
-times in order to obtain the variation
for the source variation
and for every parameter variation
This need for repeatedly solving Eq. (36) can be circumvented by applying the principles of the 1
st-CASAM-NIE-V, generally outlined in
Section 2, to eliminate the appearance of the variation
in the indirect-effect term defined in Eq. (35) while expressing this indirect-effect term as a functional of a first-level adjoint function that does not depend on any parameter variation, as follows:
Consider that the function
belongs to a Hilbert space denoted as
, which is defined on the domain
. The inner product in
of two functions
and
will be denoted as
and is defined as follows:
Form the inner product of Eq. (36) with a vector
, where the superscript “(1)” indicates “1
st-Level”, to obtain the following relationship:
Transform the left-side of Eq. (46) as follows:
-
Require the last term in Eq. (47) to represent the indirect-effect term defined in Eq. (35) which yields the following “1
st-Level Adjoint Sensitivity System (1
st-LASS)” for the first-level adjoint sensitivity function
:
The 1
st-LASS represented by Eq. (50) is a linear NIE-Volterra net, which is independent of any parameter variation and needs to be solved just once to obtain the first-level adjoint sensitivity function
. Notably, the 1
st-LASS is an “initial-value problem,” in that the computation of
commences at the lowest-energy value, where
, and progresses towards the highest-energy value,
. For further reference, the closed-form solution of Eq. (50) can be obtained by differentiating this equation with respect to
and subsequently integrating the resulting first-order linear differential equation, to obtain the following exact expression:
The expression on the right-side of Eq. (51) is to be evaluated at the nominal parameter values , but the superscript “zero” has been omitted for notational simplicity.
-
Using Eqs. (46), (47) and (50) yields the following expression for the indirect-effect term defined in Eq. (35):
The expression on the right-side of Eq. (52) is to be evaluated at the nominal parameter values , but the superscript “zero” has been omitted for notational simplicity.
Adding the expression obtained in Eq. (52) to the expression of the direct-effect term represented by Eq. (34) yields the following expression for the first-order G-variation
:
It follows from Eq. (53) that the first-order sensitivities of the decoder response with respect to the (encoder’s) source strength and the optimal weights/parameters have the following expressions:
Inserting into Eqs. (54)‒(57) the closed-form expression for the neutron collision density obtained in Eq. (31) yields the following closed-form explicit expressions for the first-order sensitivities of the decoder response with respect to the (encoder’s) source strength and the optimal weights/parameters:
The correctness of the expressions obtained in Eqs. (58)‒(61) can be readily verified by differentiating the expressions of the decoder’s response obtained in Eq. (32).
In practice, only the exact mathematical expression of the 1st-LASS, namely Eq. (50), and the exact mathematical expression of the first-order sensitivities obtained in Eqs. (54)‒(57) are available. The solution of the 1st-LASS, which is a linear NIE-Volterra net for the first-level adjoint sensitivity function , would need to be obtained numerically, in practice. The numerical solution for would be used to determine the first-order sensitivities stemming from the “indirect-effect” term by using quadrature formulas to evaluate the integrals obtained in Eqs. (54) and (57). It is very important to note that a single “large-scale” computation, for determining numerically the adjoint function by solving the 1st-LASS (a NIE-Volterra type equation), would be needed for evaluating all of the first-order sensitivities. The numerical computations using quadrature formulas for evaluating the integrals in Eqs. (54) and (57) are considered to be “small-scale” computations.
As has been already observed in the brief remarks following Eq. (37), the computation of the first-order sensitivities of the decoder response with respect to the encoder source strength S and model weights/parameters could also have been computed by numerically solving repeatedly the NIE-Volterra net (1st-LVSS) represented by Eq. (36). This procedure would be very expensive computationally, since it would require large scale computations to solve the 1st-LVSS defined by Eq. (36) in order to obtain the variation for every parameter variation and the source variation . In addition, the same amount of “quadrature” computations would need to be performed using Eq. (35) as would be needed for evaluating the first-order sensitivities using Eqs. (54) and (57).
3.2. Efficient Indirect Computation Using the 1st-FASAM-NIE-V of the First-Order Sensitivities of the Decoder Response with Respect to Pimary Model Parameters
When feature functions of model parameters such as
and
can be identified, as is the case with the NIE-Volterra net and decoder response represented by Eqs. (26) and (27), respectively, it is considerably more efficient to determine the first-order sensitivities of the decoder response with respect to the feature functions and subsequently derive analytically the sensitivities with respect to the primary model parameters by using the “chain rule of differentiation,” as will be shown in this Section. Thus, considering arbitrary variations
and
around the nominal values
and, respectively,
, the first-order G-variation of the decoder response has the following expression:
where the expression of the indirect effect term is defined in Eq. (35). The first-order relation between the variation
and the variations
and
is obtained, by definition, from Eq. (26) as follows:
where:
Comparing Eq. (63) to Eq. (36) indicates that the only difference between these equations is the expression of the term
, which is expressed in terms of
in Eq. (64). Consequently, the first-level adjoint sensitivity function that corresponds to the variational function
is determined by following the same procedure as outlined in Eqs. (46)‒(50), ultimately obtaining the same 1
st-LASS as was obtained in Eq. (50), having as solution the same expression for
as was obtained in Eq. (51). It further follows that the expression of the indirect-effect term will have the following expression:
It follows from Eqs. (62) and (65) that the first-order G-variation
has the following expression:
As indicated by the expression obtained in Eq. (66), the first-order sensitivities of the decoder response with respect to the feature functions and the encoder’s source strength are as follows:
The closed-form expressions of the above sensitivities are readily determined by using in Eqs. (67)‒(69) the expressions obtained in Eqs. (51) and (24), and by performing the respective integrations obtain:
The first-order sensitivities with respect to the primary parameters are obtained analytically from Eqs. (67) and (68), respectively, by using the following “chain rule” of differentiation:
The specific expressions of the first-order sensitivities , , are obtained by using Eq. (75) in conjunction with Eq. (69) and Eqs. (38)‒(44).
3.3. Discussion: Direct Versus Indirect Computation of the First-Order Sensitivities of Decoder Response with Respect to the Primary Model Parameters:
The principles of the 1
st-CASAM-NIE-V were applied in
Section 3.1 to determine the first-order sensitivities of the decoder response directly with respect to the model’s primary parameters/weights. It has been shown that this procedure requires a single “large-scale” computation to solve a NIE-Volterra equation in order to determine the (single) 1
st-level adjoint sensitivity function
, which is subsequently used in
integrals that are computed using quadrature formulas. The two additional first-order sensitivities with respect to the components of
require a single quadrature involving the forward function
.
The principles of the 1
st-FASAM-NIE-V were applied in Subsection 3.2 to determine the first-order sensitivities of the decoder response with respect to the feature functions. This path required just two (as opposed to
) numerical evaluations of (two) integrals using quadrature formulas involving the 1
st-level adjoint sensitivity function
. The sensitivities of the decoder response with respect to the primary parameters/weights were subsequently determined analytically, using the “chain rule of differentiation” of the explicitly-known expression of the feature function
. Evaluating the two additional first-order sensitivities with respect to the components of
require a single quadrature involving the forward function
, as in
Section 3.1. Evidently, the indirect path presented in
Section 3.2 is computationally more efficient, since it requires substantially fewer numerical quadratures than the path presented in
Section 3.1. The superiority of the indirect path, via “feature functions,” over the direct computation of sensitivities with respect to the model parameters will be considerably more evident for the computation of second-order sensitivities, as will be shown in the forthcoming
Section 4 and
Section 5, below.
Of course, when no feature functions can be identified, the 1st-FASAM-NIE-V methodology becomes identical to the 1st-CASAM-NIE-V methodology.
4. The Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of Volterra-Type (2nd-FASAM-NIE-V)
The second-order sensitivities of the response
defined in Eq. (3) will be computed by conceptually using their basic definitions as being the “first-order sensitivities of the first-order sensitivities.” Thus, the second-order sensitivities stemming from the first-order sensitivities
are obtained from the first-order G-differential of Eq. (18), for
, as follows:
In Eq. (76), the expression of the direct-effect term
is obtained after performing the operations with respect to the scalar
and comprises the variations
(stemming from variations in the model parameters), being defined as follows:
The expression on the right-side of Eq. (77) is to be evaluated at the nominal/optimal values for the respective model parameters, but this fact has not been indicated explicitly in order to simplify the notation.
The expression of the indirect-effect term
defined in Eq. (76) is obtained after performing the operations with respect to the scalar
and comprises the variations
and
, as follows:
The expressions in Eq. (78) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (i.e., the superscript “zero”) has been omitted in order to simplify the notation.
The direct-effect term
can be evaluated at this time for all variations
, but the indirect-effect term
can be evaluated only after having determined the variations
and
. The variation
is the solution of the 1
st-LVSS defined by Eq. (9). On the other hand, the variational function
is the solution of the system of equations obtained by G-differentiating the 1
st-LASS. By definition, the G-differential of Eq. (15) is obtained as follows, for
:
Performing the operations indicated in Eq. (79) and rearranging the various terms yields the following relations, for
:
where:
As indicated by the result obtained in Eq. (80), the variations
are coupled to the variations
. Therefore, they can be obtained by simultaneously solving Eqs. (80) and (9), which together will be called the “2
nd-Level Variational Sensitivity System (2
nd-LVSS).” The solution of the 2
nd-LVSS, namely the vector
, will be called the “2
nd-level variational sensitivity function.” Since the 2
nd-LVSS depends on the variations
(stemming from variations in the model parameters), it would need to be solved anew for each such variation. The repeated solving of the 2
nd-LVSS can be avoided by following the general principles underlying the 2
nd-FASAM [
16], which considers the function
to be an element in a Hilbert space denoted as
. The Hilbert space
is considered to be endowed with an inner product denoted as
, between two vectors
and
, with
,
,
,
, which is defined as follows:
Following the general principles underlying the 2
nd-FASAM [
16], the function
will be eliminated from the expression of each indirect-effect terms
,
, defined in Eq. (78). This elimination is achieved by considering, for each index
, a vector-valued function denoted as
, with
and
. Using the definition provided in Eq. (82), we construct the inner product of Eqs. (9) and (80) with the vector
, to obtain the following relation:
where:
Following the principles of the 2
nd-CASAM [
16], the left-side of Eq. (83) will be identified with the indirect-effect term defined in Eq. (78), thereby determining the (yet undetermined) functions
. For this purpose, the right-side of Eq. (78) is cast in the form of the inner product
. The terms on the right-side of Eq. (78) involving the components of the function
are already in the desired format, but the terms involving the components of the function
must be re-arranged, as follows:
- (i)
The fourth term on the right-side of Eq. (78), is recast by using “integration by parts” as follows:
- (ii)
The sixth (last) term on the right-side of Eq. (78), is recast by using “integration by parts,” as above, to obtain the following relation:
Using in Eq. (78) the results obtained in Eqs. (85) and (86) yields the following expression for the indirect-effect term, for
:
The left-side of Eq. (83) is now recast in the form of the inner product by performing the following operations:
- (i)
The second term on the left-side of Eq. (83) is rearranged by using “integration by parts” as follows:
- (ii)
The fourth term on the left-side of Eq. (83) is rearranged by using “integration by parts” as follows:
- (iii)
The fifth term on the left-side of Eq. (83) is rearranged as follows:
- (iv)
The sixth term on the left-side of Eq. (83) is rearranged as follows:
Inserting the results obtained in Eqs. (88)‒(91) into the left-side of Eq. (83) yields the following relation:
The right-side of Eq. (92) can now be required to represent the indirect-effect term defined in Eq. (87), by imposing the requirement that the hitherto arbitrary function
be the solution of the following NIE-Volterra equations, for
:
It follows from Eqs. (92)‒(94) that the indirect-effect term
defined by Eq. (78) or, equivalently, Eq. (87) can be expressed in terms of the function
as follows, for
:
The second-order sensitivities
of the decoder-response with respect to the components of the feature function are obtained by adding the expression of the indirect-effect term obtained in Eq. (95) to the expression for the direct-effect term obtained in Eq. (77) and subsequently identifying the expressions that multiply the variations
. The expressions thus obtained for
, for
, are as follows:
The NIE-Volterra system presented in Eqs. (93) and (94) is called the “2nd-Level Adjoint Sensitivity System (2nd-LASS)” and its solution, , , is called the “2nd-level adjoint sensitivity function.” Since the sources on the right-sides of Eqs. (93) and (94) stem from the first-order sensitivities , , they are dependent on the index “j”, which implies that for each first-order sensitivity , there will correspond a distinct 2nd-LASS, having a distinct solution , a fact that has been emphasized by using the index “j” in the list of arguments of this 2nd-level adjoint sensitivity function. Therefore, there will be as many 2nd-level adjoint functions as there are distinct first-order sensitivities , which is equivalent to the number of components of the “feature-function” . Notably, the integral operators on the left-sides of Eqs. (93) and (94) do not depend on the index “j”, which means that the same left-hand side needs to be inverted for computing the 2nd-level adjoint function, regardless of the source term on the right-side (which corresponds to the particular component of the feature-function) of Eqs. (93) and (94). Therefore, if the inverses of the operators appearing on the left-sides of Eqs. (93) and (94) could be stored, they would not need to be inverted repeatedly, so the various 2nd-level adjoint functions would be computed most efficiently.
The second-order sensitivities of the decoder-response with respect to the optimal weights/parameters
, are obtained analytically by using the chain rule in conjunction with the expressions obtained in Eqs. (96) and (18), as follows:
When there are no feature functions but only individual model parameters, i.e., when for all , the expression obtained in Eq. (96) yields directly the second-order sensitivities , for all . In this case, the 2nd-LASS would need to be solved -times rather than just -times , when feature functions can be constructed.