Estimation of Expectations and Variance Components in Two-Level Nested Simulation Experiments

David Fernando Muñoz

doi:10.20944/preprints202305.0623.v1

Submitted:

08 May 2023

Posted:

09 May 2023

You are already at the latest version

Abstract

When there is uncertainty in the value of parameters of the input random components of a stochastic simulation model, two-level nested simulation algorithms are used to estimate the expectation of performance variables of interest. In the outer level of the algorithm (n) observations are generated for the parameters, and in the inner level (m) observations of the simulation model are generated with the value of parameters fixed at the value generated in the outer level. In this article, we consider the case in which the observations at both levels of the algorithm are independent, showing how the variance of the observations can be decomposed into the sum of a parametric variance and a stochastic variance. Next, we derive central limit theorems that allow us to compute asymptotic confidence intervals to assess the accuracy of the simulation-based estimators for the point forecast and the variance components. Under this framework, we derive analytical expressions for the point forecast and the variance components of a Bayesian model to forecast sporadic demand; and we use these expressions to illustrate the validity of our theoretical results by performing simulation experiments using this forecast model.

Keywords:

Bayesian forecasting

;

stochastic simulation

;

parameter uncertainty

;

two-level simulation

Subject:

Computer Science and Mathematics - Computational Mathematics

1. Introduction and Notation

Simulation is widely recognized as an effective technique to produce forecasts, evaluate risk (see, e.g., [1]), animate and illustrate the performance of a system over time (see, e.g., [2]). When there is uncertainty in a component of a simulation model, it is said to be a random component, and it is modeled using a probability distribution and/or a stochastic process that is generated during the simulation run, to produce a stochastic simulation. Random component typically depends on the value of certain parameters, and we will denote by

θ

a particular value for the vector of parameters of the random components of a stochastic simulation, and

Θ

will denote the random vector that corresponds to the parameter values when there is uncertainty on the value of these parameters.

In general, the output of a stochastic (dynamic) simulation can be regarded as a stochastic process

{Y (s) : s \geq 0; Θ}

, where

Y (s)

is a random vector (of arbitrary dimension d) representing the state of the simulation at time

s \geq 0

. The term transient simulation applies to a dynamic simulation that has a well-defined termination time, so that the output of a transient simulation can be viewed as a stochastic process

{Y (s) : 0 \leq s \leq T; Θ}

, where T is a stopping time (may be deterministic), see, e.g., [3] for a definition of stopping time. Note that this notation includes the case of a discrete-time output

Z_{0}, Z_{1}, \dots

, if we assume that

Y (s) = Z_{⌊ s ⌋}

, where

⌊ s ⌋

denotes the integer part of s.

A performance variable W in transient simulation is a real-valued random variable (r.v.) that depends on the simulation output up to time T, i.e.,

W = f (Y (s), 0 \leq s \geq T; Θ)

, and the expectation of a performance variable W is a performance measure that we usually estimate through experimentation with the simulation model. When there is no uncertainty in the parameters of the random components, the standard methodology that is used to estimate a performance measure in transient simulation is the method of independent replications, that consists on running the simulation model to produce n replications

W_{1}, W_{2}, \dots, W_{n}

that can be regarded as independent and identically distributed (i.i.d.) random variables (see Figure 1) .

Under the method of independent replications, a point estimator for the expectation

α = E [W_{1}]

is the average

\hat{α} (n) = \frac{\sum_{i = 1}^{n} W_{i}}{n}

. If

E [| W_{1} |] < \infty

, it follows from the classical Law of Large Numbers (LLN), that

\hat{α} (n)

is consistent, i.e., it satisfies

\hat{α} (n) \Rightarrow α

, as

n \to \infty

(where ⇒ denotes weak convergence of random variables), see, e.g., [3] for a proof. Consistency guarantees that the estimator approaches the parameter as the number of replications n increases, and the accuracy of the simulation-based estimator

\hat{α} (n)

is typically assessed by an asymptotic confidence interval (ACI) for the parameter. The expression for an ACI for a parameter of a stochastic simulation is usually obtained through a Central Limit Theorem (CLT) for the estimator (see, for example, chapter 3 of [4]). For the case of the expectation

α

in the algorithm of Figure 1, if

E [W_{1}^{2}] < \infty

, the classical CLT implies that

\frac{\sqrt{n} (\hat{α} (n) - α)}{σ} \Rightarrow N (0, 1),

(1)

as

n \to \infty

, where

σ^{2} = E [{(W_{1} - α)}^{2}]

and

N (0, 1)

denotes a r.v. distributed as normal with mean 0 and variance 1. Then, if

E [W_{1}^{2}] < \infty

, it follows from (1) and Slutsky’s Theorem (see the Appendix) that

\frac{\sqrt{n} (\hat{α} (n) - α)}{\hat{σ} (n)} \Rightarrow N (0, 1),

as

n \to \infty

, where

\hat{σ} (n)

denotes the sample standard deviation, i.e.,

{\hat{σ}}^{2} (n) = \frac{\sum_{i = 1}^{n} {(W_{i} - \hat{α} (n))}^{2}}{n - 1}

. This CLT implies that

lim_{n \to \infty} P [| \hat{α} (n) - α | \leq z_{β} \hat{σ} (n) / \sqrt{n}] = 1 - β,

for

0 < β < 1

, where

z_{β}

denotes the (

1 - β / 2

)-quantile of a N(0,1), which is sufficient to establish a

(1 - β) 100 %

ACI for

α

with halfwidth

H W_{α} = z_{β} \hat{σ} (n) / \sqrt{n} .

(2)

A halfwidth in the form of (2) is the typical measure used in simulation software (e.g., Simio, see [2]) to assess the accuracy of

\hat{α} (n)

for the estimation of expectation

α

.

In contrast to the estimation of (output) performance measures, parameters of (input) random components of a simulation model are usually estimated from real-data observations (x) and, while most applications covered in the relevant literature assume that no uncertainty exists in the value of these parameters, the uncertainty can be significant when little data is available. In these cases, Bayesian statistics can be used to incorporate this uncertainty in the output analysis of simulation experiments via the use of a posterior distribution

p (θ | x)

. A methodology currently proposed for the analysis of simulation experiments under parameter uncertainty, is a two-level nested simulation algorithm (see, e.g., [6,7,8]. In the outer level, we simulate (n) observations for the parameters from a posterior distribution

p (θ | x)

, while in the inner level we simulate (m) observations for the performance variable with the parameters fixed at the value (

θ

) generated in the outer level (see Figure 2). In this paper, we focus on the output analysis of two-level simulation experiments, for the case where the observations at the inner level are independent, showing how the variance of a simulated observation can be decomposed into parametric and stochastic variance components. Afterwards, we derive a CLT for both the estimator of the point forecast and the estimators of the variance components. Our CLTs allow us to compute an ACI for each estimator. Our results are validated through experiments with a forecast model for sporadic demand reported in [10]. This paper is an extended version of results initially reported in [11] and the missing proofs in [11] are provided.

Following this introduction, we present the proposed methodology for the construction of an ACI for the point forecast and the variance components in a two-level simulation experiment. Afterwards, we present an illustrative example that has an analytical solution for the parameters of interest in this paper. This example is used in the next section to illustrate the application and validity of our proposed methodologies for the construction of an ACI. Finally, in the last section, we present conclusions and directions for future research.

2. Theoretical Results

To identify the variance components in each observation

W_{i j}

of the algorithm illustrated in Figure 2, let

μ (Θ) = E [W_{11} | Θ]

, and

σ^{2} (Θ) = E [W_{11}^{2} | Θ] - μ^{2} (Θ)

. Under this notation, the point forecast is

α = E [μ (Θ)]

, and the variance of each

W_{i j}

is:

V [W_{i j}] \overset{d e f}{=} E [W_{i j}^{2}] - E {[W_{i j}]}^{2} = E [E [W_{i j}^{2} | Θ] - μ {(Θ)}^{2}] + E [μ {(Θ)}^{2}] - E {[μ (Θ)]}^{2} = σ_{S}^{2} + σ_{P}^{2},

(3)

for

i = 1, . . ., n

;

j = 1, . . ., m

, where

σ_{P}^{2} = V [μ (Θ)] \overset{d e f}{=} E [μ {(Θ)}^{2}] - E {[μ (Θ)]}^{2}

, and

σ_{S}^{2} = E [σ^{2} (Θ)]

. It is worth mentioning that, in the relevant literature,

σ_{S}^{2}

is commonly referred to as stochastic variance and

σ_{P}^{2}

is commonly referred to as parametric variance.

2.1. Point Estimators

In this paper, we are interested in both the estimation of the point forecast

α = E [μ (Θ)]

and the estimators of the variance components of every observations generated in the algorithm of Figure 2 and defined in (3), thus we first consider the natural point estimators

\hat{α} (n) = \frac{1}{n} \sum_{i = 1}^{n} {\hat{α}}_{i}, {\hat{σ}}_{T}^{2} (n) = \frac{1}{n - 1} \sum_{i = 1}^{n} {({\hat{α}}_{i} - \hat{α} (n))}^{2}, {\hat{σ}}_{S}^{2} (n) = \frac{1}{n} \sum_{i = 1}^{n} S_{i}^{2},

(4)

where

{\hat{α}}_{i} = m^{- 1} \sum_{j = 1}^{m} W_{i j}

, and

S_{i}^{2} = {(m - 1)}^{- 1} \sum_{j = 1}^{m} {(W_{i j} - {\hat{α}}_{i})}^{2}

,

i = 1, . . ., m

. Note that the

{\hat{α}}_{i}

’s are i.i.d. with expectation

E [{\hat{α}}_{1}] = α

and variance

\begin{matrix} σ_{T}^{2} & \overset{d e f}{=} & E [{({\hat{α}}_{1} - α)}^{2}] = m^{- 2} (m E [{(W_{11} - α)}^{2}] + m (m - 1) E [(W_{11} - α) (W_{12} - α)]) \\ = & m^{- 1} (σ_{S}^{2} + σ_{P}^{2}) + m^{- 1} (m - 1) σ_{S}^{2} = σ_{S}^{2} + m^{- 1} σ_{P}^{2} . \end{matrix}

(5)

On the other hand, the

S_{i}^{2}

are i.i.d. with expectation

E [S_{1}^{2}] = σ_{S}^{2}

. Thus, the next proposition follows from the classical LLN .

Proposition 1.

Given

m \geq 1

, if

E [W_{11}^{2}] < \infty

then

\hat{α} (n)

and

{\hat{σ}}_{T}^{2} (n)

are unbiased and consistent (as

n \to \infty

) estimators for α and

σ_{T}^{2}

(as defined in (5)), respectively. Furthermore, if

m \geq 2

and

E [W_{11}^{2}] < \infty

, then

{\hat{σ}}_{S}^{2} (n)

is an unbiased and consistent (as

n \to \infty

) estimator for

σ_{S}^{2}

(as defined in (3)).

2.2. Accuracy of the Point Estimators

As we established in Proposition 1, under mild assumptions the point estimators proposed in (4) are consistent, and thus converge to the corresponding parameter value (as

n \to \infty

). Nonetheless, to establish the level of accuracy of these estimators, we must establish a CLT for each estimator to derive a valid expression for the corresponding ACI. Note that both

\hat{α} (n)

and

{\hat{σ}}_{S}^{2} (n)

are averages of i.i.d observations, thus the next proposition follows from the classical CLT for i.i.d. observations.

Proposition 2.

Given

m \geq 1

, if

E [W_{11}^{2}] < \infty

then

\frac{\sqrt{n} (\hat{α} (n) - α)}{σ_{T}} \Rightarrow N (0, 1),

as

n \to \infty

. Furthermore, if

m \geq 2

and

E [W_{11}^{4}] < \infty

, then

\frac{\sqrt{n} ({\hat{σ}}_{S}^{2} (n) - σ_{S}^{2})}{\sqrt{V_{S}}} \Rightarrow N (0, 1),

as

n \to \infty

, where

σ_{S}^{2}

is defined in (3),

σ_{T}^{2}

is defined in (5),

\hat{α} (n)

,

{\hat{σ}}_{S}^{2} (n)

,

S_{1}^{2}

are defined in (4), and

V_{S} = E [{(S_{1}^{2} - σ_{S}^{2})}^{2}]

.

Since we have consistent estimators for

σ_{T}^{2}

and

V_{S}

(under mild assumptions), the next corollary follows from Proposition 1 and Slutsky’s Theorem, details of a proof are given in the Appendix.

Corollary 1.

Under the same notation and assumptions as in Proposition 2, for

m \geq 1

we have

\frac{\sqrt{n} (\hat{α} (n) - α)}{\sqrt{{\hat{σ}}_{T}^{2} (n)}} \Rightarrow N (0, 1),

as

n \to \infty

, and for

m \geq 2

we have

\frac{\sqrt{n} ({\hat{σ}}_{S}^{2} (n) - σ_{S}^{2})}{\sqrt{{\hat{V}}_{S} (n)}} \Rightarrow N (0, 1),

as

n \to \infty

, where

{\hat{σ}}_{S}^{2} (n)

and

{\hat{σ}}_{T}^{2} (n)

are defined in (4), and

\hat{V_{s}} (n) = \frac{1}{n - 1} \sum_{i = 1}^{n} {(S_{i}^{2} - S^{2})}^{2}, S^{2} = \frac{1}{n} \sum_{i = 1}^{n} S_{i}^{2}

In order to obtain a CLT for

{\hat{σ}}_{T}^{2} (n)

, note that this estimator is the sample variance of a set of i.i.d. observations, thus we can use the following Lemma. A proof using the Delta Method (see, e.g., Proposition 2 of [9] for a proof) is provided in the Appendix.

Lemma 1.

If

X_{1}, X_{2}, . . .

is a sequence of i.i.d. random variables with

E [X_{1}^{4}] < \infty

, then

\frac{\sqrt{n} (S^{2} (n) - σ_{1}^{2})}{\sqrt{σ_{2}^{2}}} \Rightarrow N (0, 1),

as

n \to \infty

, where

σ_{1}^{2} = μ_{2} - μ_{1}^{2}

,

σ_{2}^{2} = μ_{1}^{2} μ_{2} - 4 μ_{1}^{4} - 4 μ_{1} μ_{3} + μ_{4} - μ_{2}^{2}

,

μ_{k} = E [X_{1}^{k}]

,

k = 1, 2, 3, 4

;

S^{2} (n) = {(n - 1)}^{- 1} \sum_{i = 1}^{n} {(X_{i} - {\hat{μ}}_{1})}^{2}

,

{\hat{μ}}_{1} = n^{- 1} \sum_{i = 1}^{n} X_{i}

.

Corollary 2.

Under the same assumptions as in Lemma 1 we have

\frac{\sqrt{n} (S^{2} (n) - σ_{S}^{2})}{\sqrt{{\hat{σ}}_{2}^{2} (n)}} \Rightarrow N (0, 1),

as

n \to \infty

, where

{\hat{σ}}_{2}^{2} = 8 {\hat{μ}}_{1}^{2} {\hat{μ}}_{2} - 4 {\hat{μ}}_{1}^{4} - 4 {\hat{μ}}_{1} {\hat{μ}}_{3} + {\hat{μ}}_{4} - {\hat{μ}}_{2}^{2}

,

{\hat{μ}}_{k} = n^{- 1} \sum_{i = 1}^{n} X_{i}^{k}

.

Corollary 2 follows from the fact that

{\hat{μ}}_{k}

is an unbiased and consistent estimator of

μ_{k}

, and the next corollary follows from the fact that

{\hat{σ}}_{T}^{2} (n)

is the sample variance of the

{\hat{α}}_{i}

.

Corollary 3.

Given

m \geq 1

, if

E [W_{11}^{4}] < \infty

then

\frac{\sqrt{n} ({\hat{σ}}_{T}^{2} (n) - σ_{T}^{2})}{\sqrt{{\hat{V}}_{T} (n)}} \Rightarrow N (0, 1),

as

n \to \infty

, where

{\hat{V}}_{T} (n) = 8 {\bar{α}}_{1}^{2} {\bar{α}}_{2} - 4 {\bar{α}}_{1}^{4} - 4 {\bar{α}}_{1} {\bar{α}}_{3} + {\bar{α}}_{4} - {\bar{α}}_{2}^{2}

,

{\bar{α}}_{k} = n^{- 1} \sum_{i = 1}^{n} {\hat{α}}_{i}^{k}

.

Let

0 < β < 1

, and using corollaries 1 and 3 we can establish a

100 (1 - β) %

ACI for the point forecast

α

, and variance components

σ_{S}^{2}

and

σ_{T}^{2} = σ_{S}^{2} + m^{- 1} σ_{P}^{2}

; each ACI is centered in the corresponding point estimator (

\hat{α} (n)

,

{\hat{σ}}_{S}^{2} (n)

or

{\hat{σ}}_{T}^{2} (n)

) and the corresponding halfwidth is given by:

H W_{α} = z_{β} \frac{\sqrt{{\hat{σ}}_{T}^{2} (n)}}{\sqrt{n}}, H W_{σ_{S}^{2}} = z_{β} \frac{\sqrt{{\hat{V}}_{S} (n)}}{\sqrt{n}}, and H W_{σ_{T}^{2}} = z_{β} \frac{\sqrt{{\hat{V}}_{T} (n)}}{\sqrt{n}},

(6)

for

α

,

σ_{2}^{2}

and

σ_{T}^{2}

, respectively, where

{\hat{σ}}_{T}^{2} (n)

is defined in (4),

{\hat{V}}_{S} (n)

and

{\hat{V}}_{T} (n)

are defined in Corollary 1, and in Corollary 3, respectively.

Note that the ACIs proposed in (6) assume that the value of m in the algorithm of Figure 2 is fixed and the accuracy of the estimator improves as n (the number of observations in the outer level) increases (in turn, the halfwidth of the ACI gets smaller). Given that we can build a valid ACI for any value of m, a relevant question is how to find an adequate value of m to get an acceptable level of accuracy in a reasonable amount of running time. In order to answer this question for the case of the point estimator of

α

, let us fix the total number of iterations in the algorithm of Figure 2 to k =

n m

, and note from (5) and Proposition 2 that the asymptotic variance of

(\hat{α} (n) - α)

is

n^{- 1} σ_{T}^{2} = k^{- 1} (m σ_{S}^{2} + σ_{P}^{2}),

(7)

and takes its minimal value for

m = 1

, suggesting that the point estimator

\hat{α} (n)

defined in (4) is more accurate as m approaches the value of 1. Note that for

m = 1

, a fixed number of iterations

k = n m

is convenient (from the point of view of running time), when the computation of

W_{i j}

requires the same or more computation time as

Θ_{i}

, as suggested in the relevant literature (see, for example, [6]). Furthermore, if we allow m to increase with n, we can obtain the following proposition (a proof using Lindeberg-Feller Theorem is provided in the Appendix).

Proposition 3.

Given

0 < p \leq 1

, if

m = * n^{- 1 + 1 / p}

and

E [W_{11}^{2}] < \infty

then

\frac{\sqrt{n} (\hat{α} (n) - α)}{\sqrt{σ_{T}^{2}}} \Rightarrow N (0, 1),

as

n \to \infty

, where

σ_{T}^{2}

is defined in (5).

Note that the last proposition implies that the ACI defined in equation (6) for the point forecast

α

is also valid under the assumptions of Proposition 3. If, once again, we set the total number of iterations in the algorithm of Figure 2 to

k = n m

, we let

n \approx k^{p}

,

m \approx k^{1 - p}

, and nm = k, it follows from Proposition 3 that the asymptotic variance of

\hat{α} (n)

is

n^{- 1} σ_{T}^{2} \approx k^{- p} (k^{1 - p} σ_{S}^{2} + σ_{P}^{2})

for every

0 \leq p \leq 1

. Note that, for fixed k,

n^{- 1} σ_{T}^{2}

reaches its minimum value when

p = 1

, that is, when

n = k

and

m = 1

. However, note that we need

m \geq 2

in order to estimate

σ_{S}^{2}

. In the following section we report some empirical results that confirm our theoretical results. It is worth mentioning that the case

n = k

and

m = 1

has been reported in the literature as the posterior sampling algorithm (see, e.g., [12,13])

3. An Example with Analytical Solution

The following model (reported in [10]) has been proposed to forecast sporadic demand by incorporating data on times between arrivals and customer demand; where uncertainty on the model parameters is incorporated using a Bayesian approach. For this model, we will show analytical expressions for the performance measures defined in Section 2. These expressions are used in the following section to illustrate the validity of the ACIs proposed in the previous section.

Customer arrivals for a particular item in a shop follow a Poisson process, yet there is uncertainty in the arrival rate

Θ_{0}

, so that given

[Θ_{0} = θ_{0}]

, interarrival times between customers are i.i.d. with exponential density:

f (y | θ_{0}) = \{\begin{matrix} θ_{0} e^{- θ_{0} y}, & y > 0, \\ 0, & otherwise, \end{matrix}

(8)

where

θ_{0} \in S_{00} = (0, \infty)

. Every client can order j units of this item with probability

Θ_{1 j}

,

j = 1, . . ., q

,

q \geq 2

. Let

Θ_{1} = (Θ_{11}, . . . Θ_{1 (q - 1)})

and

Θ_{1 q} = 1 - \sum_{j = 1}^{q - 1} Θ_{1 j}

, then

Θ = (Θ_{0}, Θ_{1})

is the parameter vector, and

S_{0} = S_{00} ⨂ S_{01}

is the parameter space, where

S_{01} = {(θ_{11}, . . ., θ_{1 (q - 1)}) : \sum_{j = 1}^{q - 1} θ_{1 j} \leq 1; θ_{1 j} \geq 0, j = 1, . . ., q - 1}

.

Total demand during a period of length T is

D = \{\begin{matrix} \sum_{i = 1}^{N (T)} U_{i}, & N (T) > 0 \\ 0, & otherwise, \end{matrix}

(9)

where

N (s)

is the number of customer arrivals during the interval

[0, s]

,

s \geq 0

, and

U_{1}, U_{2}, . . .

are the individual demands (conditionally independent relative to

Θ

). The information about

Θ

consists of i.i.d. observations

v = (v_{1}, . . ., v_{r})

,

u = (u_{1}, . . ., u_{r})

of past customers, where

v_{i}

is the interarrival time between customer i and customer (

i - 1

), and

u_{i}

is the number of units ordered by client i. By taking Jeffrey’s non-informative prior as the prior density for

Θ

, we obtain the posterior density (see [10] for details)

p (θ | x) = p (θ_{0} | v) p (θ_{1} | u)

, where

x_{i} = (v_{i}, u_{i})

,

i = 1, . . ., r

,

x = (x_{1}, . . ., x_{r})

,

θ = (θ_{0}, θ_{1})

as

p (θ_{0} | v) = \frac{θ_{0}^{r - 1} {(\sum_{i = 1}^{r} v_{i})}^{r} e^{- θ_{0} \sum_{i = 1}^{r} v_{i}}}{(r - 1)!}, p (θ_{1} | u) = \frac{{(1 - \sum_{j = 1}^{q - 1} θ_{1 j})}^{c_{q} - 1 / 2} Π_{j = 1}^{q - 1} θ_{1 j}^{c_{j} - 1 / 2}}{B (c_{1} + 1 / 2, . . ., c_{q} + 1 / 2)},

(10)

where

c_{j} = \sum_{i = 1}^{r} I [u_{i} = j]

, and

B (a_{1}, . . . a_{q}) = Π_{j = 1}^{q} Γ (a_{j}) / Γ (\sum_{j = 1}^{q} a_{j})

, for

a_{1}, . . ., a_{q} > 0

. Using this notation, we can show that (see [1] for details)

α = E [T Θ_{0}] \sum_{j = 1}^{q} j p_{j},

σ_{P}^{2} = \frac{E [T^{2} Θ_{0}^{2}]}{(q_{0} + 1)} \sum_{j = 1}^{q} j^{2} p_{j} + \frac{E {[T Θ_{0}]}^{2} [(q_{0} / n) - 1]}{(q_{0} + 1)} {(\sum_{j = 1}^{q} j p_{j})}^{2},

σ_{S}^{2} = E [T Θ_{0}] \sum_{j = 1}^{q} j^{2} p_{j},

where

E [T Θ_{0}] = T r {(\sum_{i = 1}^{r} v_{i})}^{- 1}

,

E [T^{2} Θ_{o}^{2}] = T^{2} r (1 + r) {(\sum_{i = 1}^{r} v_{i})}^{- 2}

,

p_{j} = q_{j} / q_{0}

,

q_{j} = c_{j} + 1 / 2

,

j = 1, . . ., q

,

q_{0} = \sum_{j = 1}^{q} q_{j}

, and

c_{j}

are defined in (10).

4. Empirical Results

To validate the ACIs proposed in (4), we conducted some experiments with the Bayesian model of the previous section to illustrate the estimation of

α

,

σ_{S}^{2}

and

σ_{T}^{2}

. We considered the values

T = 15

,

r = 20

,

\sum_{i = 1}^{r} x_{i} = 10

,

q = 5

,

c_{1} = 5

,

c_{2} = 3

,

c_{3} = 2

,

c_{4} = 3

,

c_{5} = 7

. With this data, the point forecast is

α \approx 95.333

, and the variance components are

σ_{S}^{2} \approx 380.667

,

σ_{P}^{2} \approx 568.598

. The empirical results that we report below illustrate a typical behavior that we should experiment for any other feasible data set.

In all the experiments reported in this Section we considered 1000 independent replications of the algorithm of Figure 2 for different number of observations in the outer level (n) and in the inner level (m); in each replication we computed the point estimators for

α

,

σ_{S}^{2}

, and

σ_{T}^{2}

, and the corresponding halfwidths of 90% ACI’s according to equations (6). Since we know the value of the parameters we are estimating, we were able to report (for n and m given), the empirical coverage (i.e., the fraction of independent replications that the corresponding ACI covered the true parameter value), the average and standard deviation of halfwidths, and the squared root of the empirical mean squared error defined by

R M S E = \sqrt{\frac{1}{n_{0}} \sum_{i = 1}^{n_{0}} {({\hat{θ}}_{i} - θ)}^{2}},

where

{\hat{θ}}_{i}

denotes the value obtained in the i-th replication for the estimation of a parameter

θ, i = 1, 2, \dots, n_{0}

(

n_{0} = 1000

in our experiments).

In a first set of experiments we considered

n m = 200, 2000, 20000

, and

m = 2, 4, 8

for each value of

n m

, to compare the effect of increasing the number of observations in the inner level for a given value value of

n m

. The results of this set of experiments are summarized in Figure 3, Figure 4 and Figure 5. Note that we are not considering

m = 1

in this set of experiments to be able to construct an ACI for the stochastic variance

σ_{S}^{2}

.

In Figure 3 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the point forecast

α

. As we observe from Figure 3, the coverages are acceptable (very close to the nominal value of 0.9, even for

n = 100

). These results validate the ACI defined in (6) for the point forecast

α

. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 1. Note also from Figure 3 that a smaller value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, validating our theoretical results.

In Figure 4 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the stochastic variance

σ_{S}^{2}

. As we observe from Figure 4, the coverages are acceptable (very close to the nominal value of 0.9, even for

n = 100

). These results validate the ACI defined in (6) for the stochastic variance

σ_{S}^{2}

. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 2. However, contrary to what we observe for the estimation of

α

, a larger value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, suggesting that, for a fixed value of

n m

, the quality of the estimation for the stochastic variance

σ_{S}^{2}

improves as the number of the observations in the inner loop (m) increases.

For the estimation of the total variance

σ_{T}^{2}

(illustrated n Figure 5) we obtained similar results for the quality of the estimation as for the estimation of the point forecast

α

, except that a larger values of n is required to obtain reliable coverages. As we observe from Figure 3, the coverages are acceptable (very close to the nominal value of 0.9, for

n = 1000

and 10000). These results validate the ACI defined in (6) for the total variance

σ_{T}^{2}

. We also observe from Figure 3 that the RMSE, average halfwidth and standard deviation of halwidths improve (decrease) as the number of observations in the outer level (n) increases, as suggested by Corollary 3. Note also from Figure 5 that a smaller value of m provides smaller RMSE, average halfwidths and standard deviations of halwidths, validating our theoretical results.

Figure 4. Performance of the estimation of stochastic variance

σ_{S}^{2}

for

n m

fixed comparing different values of m.

Figure 4. Performance of the estimation of stochastic variance

σ_{S}^{2}

for

n m

fixed comparing different values of m.

Figure 5. Performance of the estimation of total variance

σ_{T}^{2}

for

n m

fixed comparing different values of m.

Figure 5. Performance of the estimation of total variance

σ_{T}^{2}

for

n m

fixed comparing different values of m.

In a second set of experiments we considered

n m = 100, 1000, 10000

, with

m = 1

and

m \approx {(n m)}^{1 / 3}

for each value of

n m

, to compare the quality of the estimation procedures using the value of m that we suggest as optimal for the estimation of point forecast

α

with the value of m suggested in [6] as an adequate choice for m in the case of biased estimators in the inner level of the algorithm of Figure 2. The results of this set of experiments are summarized in Figure 6 and Figure 7. Note that we are not considering the estimation of the stochastic variance

σ_{S}^{2}

in this set of experiments because

m \geq 2

is required to construct an ACI for the stochastic variance

σ_{S}^{2}

. Note also that we considered

100^{1 / 3} \approx 5

,

1000^{1 / 3} \approx 10

, and

10000^{1 / 3} \approx 20

, and we are using the same color for

m = 5, 10, 20

in Figure 6 and Figure 7.

In Figure 6 we illustrate the performance measures for the quality of the estimation procedure that we obtained for the estimation of the point forecast

α

in our second set of experiments . As we observe from Figure 6, the coverages are acceptable (very close to the nominal value of 0.9, even for

n = 100

). These results validate the ACI defined in (6) for the point forecast

α

, and the ACI suggested by Proposition 3. We also observe from Figure 6 that the RMSE, average halfwidth and standard deviation of halwidths are worse for

m \approx {(n m)}^{1 / 3}

, confirming our finding that, for the same number of replications

n m

,

m = 1

produces better point estimators for

α

than

m \approx {(n m)}^{1 / 3}

confirming the result of Proposition 3.

Finally, in Figure 7 we show the results of our second set of experiments for the estimation of the total variance

σ_{T}^{2}

. We found similar resulta as for the case of the estimation of the point forecast

α

, coverages are very good (even for n = 100), and all performance measure for the ACI (RMSE, average and standard deviation of halfwidths) are worse for

m \approx {(n m)}^{1 / 3}

, suggesting that, for the same number of replications

n m

,

m = 1

produces better point estimators for

σ_{T}^{2}

than

m \approx {(n m)}^{1 / 3}

.

5. Conclusions

In this paper, we propose methodologies to calculate point estimators (and their corresponding halfwidths), for both the point forecast and the variance components in two-level nested stochastic simulation experiments, for the case where the observations at both levels of the algorithm are independent. These methods can be applied to the construction of Bayesian forecasts based on experiments using a simulation model under parameter uncertainty.

Both our theoretical and our experimental results confirm that the proposed point estimators and their corresponding halfwidths are asymptotically valid, i.e., the point estimators converge to the corresponding parameter values and the halfwidths converge to the nominal coverage as the number of replications (n) of the outer level increases.

Furthermore, given a fixed number of total observations (

n m

), we show that the choice of only one replication in the inner level (

m = 1

) provides more accurate estimators for both the point forecast (

α

), and the variance of the point forecast (

σ_{T}^{2}

). However,

m \geq 2

is required for the estimation of

σ_{S}^{2}

.

Directions for future research on this topic includes experimentation with other point estimators, such as, quasi Monte Carlo or Simpson integration, with the objective of finding more accurate point estimators for the parameters considered in this paper.

Funding

This research was supported by the Asociación Mexicana de Cultura A.C. and the National Council of Science of Technology of Mexico under Award Number 1200/158/2022.

Data Availability Statement

The raw data corresponding to our experiments are available in the repository of AppliedMath.

Conflicts of Interest

The Author declare no conflict of interest.

Appendix A

For completeness, we first write three well known theorems. Proofs of Theorem A1 and Theorem A2 can be found, e.g., in [5], and a proof of Theorem A3 can be found, e.g., in [9]. In what follows, we write ⇒ for weak converge (as

n \to \infty

without explicit mention).

Theorem A1. (Slutsky). Let

X, Y, X_{1}, X_{2}, . . ., Y_{1}, Y_{2}, . . .

be random variables and c be a real constant. If

X_{n} \Rightarrow X

, and

Y_{n} \Rightarrow c

, then:

(i): $X_{n} + Y_{n} \Rightarrow X + c$
(ii): $X_{n} Y_{n} \Rightarrow c X$
(iii): $X_{n} / Y_{n} \Rightarrow X / c$ , if $c \neq 0$

Theorem A2. (Continuous mapping). Let

X, X_{1}, X_{2}, . . .

be

ℜ^{k}

-valued random vectors, and let

g : ℜ^{k} \to ℜ

be a function such that

P [X \in D (g)] = 0

, where

D (g) = {x : g (x)

is not continuous at

x}

, then

g (X_{n}) \Rightarrow g (X)

.

Theorem A3. (Delta method). Let

Y_{1}, Y_{2}, . . .

be

ℜ^{k}

-valued random vectors, and let g:

{I R}^{k} \to ℜ

be a function that is differentiable in a vecinity of

μ \in ℜ^{k}

. If there exists a

k \times k

matrix G such that the TLC

\sqrt{n} [\bar{Y} (n) - μ] \Rightarrow G N_{k} (0, 1)

is satisfied, where

\bar{Y} (n) = n^{- 1} \sum_{i = 1}^{m} Y_{i}

, and

N_{k} (0, I)

denotes a (k -variate) normal distribution with mean 0 and variance I (the identity), then

\sqrt{n} [g (\bar{Y} (n)) - g (μ)] \Rightarrow σ N (0, 1),

where

σ = \sqrt{\nabla g {(μ)}^{T} G G^{T} \nabla g (μ))}

.

Proof of Corollary 1.

Since

\hat{α_{1}}, \hat{α_{2}}, . . .

are i.i.d. with

E [{\hat{α}}_{1}^{2}] < \infty

, it follows from the Law of Large Numbers that

n^{- 1} (\sum_{i = 1}^{n} {\hat{α}}_{i}^{2}, \sum_{i = 1}^{n} {\hat{α}}_{i}) \Rightarrow (E [{\hat{α}}_{1}^{2}], E [{\hat{α}}_{1}])

. Therefore, by taking

g (x_{1}, x_{2}) = \sqrt{x_{1} - x_{2}^{2}},

for

x_{1} - x_{2}^{2} \geq 0

in Theorem A2, we have

\sqrt{(n - 1) {\hat{σ}}_{T}^{2} (n) / n} \Rightarrow \sqrt{σ_{T}^{2}}

, so that

Y_{n} = \sqrt{{\hat{σ}}_{T}^{2} (n)} / \sqrt{σ_{T}^{2}} \Rightarrow 1

. Finally, by taking

X_{n} = \sqrt{n} (\hat{α} (n) - α) / \sqrt{σ_{T}^{2}}

in Theorem A1, it follows from Proposition 2 that

\frac{\sqrt{n} (\hat{α} (n) - α)}{\sqrt{{\hat{σ}}_{T}^{2} (n)}} \Rightarrow N (0, 1)

Similarly, since

S_{1}^{2}, S_{2}^{2}, . . .

are i.i.d. with

E [S_{1}^{4}] < \infty

, it follows from Theorem A1 and Proposition 2 that

\frac{\sqrt{n} ({\hat{σ}}_{S}^{2} (n) - σ_{S}^{2})}{\sqrt{{\hat{V}}_{S} (n)}} \Rightarrow N (0, 1) .

□

Proof of Lemma 1.

Let

k = 2

,

Y_{i} = (X_{i}, X_{i}^{2})

,

μ = (μ_{1}, μ_{2})

, then the TLC of Theorem A3 is satisfied for

G G^{T} = [\begin{matrix} μ_{2} - μ_{1}^{2} & μ_{3} - μ_{1} μ_{2} \\ μ_{3} - μ_{1} μ_{2} & μ_{4} - μ_{2}^{2} \end{matrix}]

By taking

g (μ) = μ_{2} - μ_{1}^{2} = σ_{1}^{2}

, we have

g (\bar{Y} (n)) = (n - 1) S^{2} (n) / n

,

\nabla g {(μ)}^{T} = (- 2 μ_{1}, 1)

, and

\nabla f {(μ)}^{T} G G^{T} \nabla f (μ) = 8 μ_{1}^{2} μ_{2} - 4 μ_{1}^{4} - 4 μ_{1} μ_{3} + μ_{4} - μ_{2}^{2} = σ_{2}^{2} .

Then, it follows from Theorem A3 that

\sqrt{n} ⌊ (n - 1) S^{2} (n) / n - σ_{1}^{2} ⌋ \Rightarrow σ_{2} N (0, 1),

and the final conclusion follows from Theorem A1. □

Proof of Proposition 3.

In this proof we follow the notation of Lindeberg-Feller Theorem as in Theorem 7.2.1 of [3].

For n = 1, 2, ..., let

m_{n} = ⌊ n^{- 1 + 1 / p} ⌋

and

α_{j} (n) = (\sum_{i = 1}^{m_{n}} W_{i j}) / m_{n}, j = 1, . . ., n

. Then

α_{1} (n), α_{2} (n), . . ., α_{n} (n)

are independent, and for

X_{n j} = (α_{j} (n) - α) / \sqrt{n σ_{T}^{2}}

we also have that

X_{n 1}, X_{n 2}, . . . X_{n n}

are independent.

Then if

Y_{n j} = (α_{j} (n) - α) / σ_{T}

, we have

E [Y_{n j}] = 0

and

E [Y_{n j}^{2}] = 1

, so that given

ϵ > 0

there exists

η_{0} > 0

such that

\int_{| y | < η_{0}} y^{2} d F y_{n j} (y) < ϵ

.

Therefore, given

η > 0

, for

n \geq m a x {1, {(η_{0} / η)}^{2}}

we have

\sum_{j = 1}^{n} \int_{| x | < η} x^{2} d F x_{n j} (x) \leq \sum_{j = 1}^{n} \frac{1}{n} \int_{| y | < η_{0}} y^{2} d F y_{n j} (y) < ϵ,

so that (1) of Theorem 7.2.1 of [14] is satisfied, and it follows from this Theorem that

S_{n} \Rightarrow N (0, 1)

, where

S_{n} = \sum_{j = 1}^{n} X_{n j} = \frac{\sqrt{n} (\hat{α} (n) - α)}{\sqrt{σ_{T}^{2}}} .

□

References

Muñoz, D.F. Simulation output analysis for risk assessment and mitigation. In Multi-Criteria Decision Analysis for Risk Assessment and Management; Ren, J., Ed.; Springer: Heidelberg, Germany, 2021. [Google Scholar]
Smith, J.S.; Sturrock, D.T. Simio and Simulation: Modeling, Analysis, Aplications, 6th ed.; Simio LLC: Sewickley, Pennsylvania, 2022. [Google Scholar]
Chung, K.L. A Course in Probability Theory; Academic Press: Cambridge, Massachusetts, 2001. [Google Scholar]
Asmussen, S.; Glynn, P.W. Stochastic Simulation Algorithms and Analysis; Springer: Heidelberg, Germany, 2007. [Google Scholar]
Serfling, R.J. Approximation Theorems of Mathematical Statistics; John Wiley & Sons: Hoboken, New Jersey, 2009. [Google Scholar]
Andradóttir, S.; Glynn, P.W. Computing bayesian means using simulation. ACM TOMACS 2016, 26, paper–10. [Google Scholar] [CrossRef]
L’Ecuyer, P. Quasi-Monte Carlo methods with applications in finance. Finance and Stochastics 2009, 13, 307–349. [Google Scholar] [CrossRef]
Zouaoui, F. , Wilson J.R. Accounting for parameter uncertainty in simulation input modeling. IIE Transactions 2003, 35, 781–792. [Google Scholar] [CrossRef]
Muñoz, D.F.; Glynn, P.W. A batch means methodology for estimation of a nonlinear function of a steady-state mean. Manag Sci 1997, 43, 1121–1135. [Google Scholar] [CrossRef]
Muñoz, D.F.; Muñoz, D.F. Bayesian forecasting of spare parts using simulation. In Service Parts Management: Demand Forecasting and Inventory Control; Altay, N., Litteral, L.A., Eds.; Springer: Heidelberg, Germany, 2011. [Google Scholar]
Muñoz, D.F. Estimation of expectations in two-level nested simulation experiments. In Proceedings of the Name of the 29th European Modeling and Simulation Symposium, Barcelona, Spain, 18–20 Sep 2017; 233-238. [Google Scholar]
Russo, D.; Van Roy, B. Learning to optimize via posterior sampling. Mathematics of Operations Research 2014, 39, 1221–1243. [Google Scholar] [CrossRef]
Muñoz, D.F.; Muñoz, D.F.; Ramírez-López, A. On the incorporation of parameter uncertainty for inventory management using simulation. International Transactions in Operational Research 2013, 20, 493–513. [Google Scholar] [CrossRef]

Figure 1. Algorithm for the method of independent replications with parameter fixed at the value

θ

.

Figure 1. Algorithm for the method of independent replications with parameter fixed at the value

θ

.

Figure 2. Two-level algorithm for calculating a point estimator using stochastic simulation under parameter uncertainty.

Figure 3. Performance of the estimation of point forecast

α

for

n m

fixed comparing different values of m.

Figure 3. Performance of the estimation of point forecast

α

for

n m

fixed comparing different values of m.

Figure 6. Performance of the estimation of point forecast

α

for

n m

fixed comparing

m = 1

and

m \approx {(n m)}^{1 / 3}

.

Figure 6. Performance of the estimation of point forecast

α

for

n m

fixed comparing

m = 1

and

m \approx {(n m)}^{1 / 3}

.

Figure 7. Performance of the estimation of total variance

σ_{T}^{2}

for

n m

fixed comparing

m = 1

and

m \approx {(n m)}^{1 / 3}

.

Figure 7. Performance of the estimation of total variance

σ_{T}^{2}

for

n m

fixed comparing

m = 1

and

m \approx {(n m)}^{1 / 3}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Estimation of Expectations and Variance Components in Two-Level Nested Simulation Experiments

Abstract

Keywords:

Subject:

1. Introduction and Notation

2. Theoretical Results

2.1. Point Estimators

2.2. Accuracy of the Point Estimators

3. An Example with Analytical Solution

4. Empirical Results

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

MDPI Initiatives

Important Links

Subscribe