Statistical Inference for Interval-Valued Spatial Error Models

Li Guan; Hai Zhou; Wei Zhang

doi:10.20944/preprints202312.1715.v1

Submitted:

22 December 2023

Posted:

22 December 2023

You are already at the latest version

Abstract

In this paper, we shall introduce the interval-valued spatial error model. Based on the idea of least square method of single-valued case, we give the parameter estimator for interval-valued spatial error model. The theoretical properties of the proposed estimator are proved. Finally, we give the numerical analysis and a real example.

Keywords:

Interval-valued random variable

;

Spatial error model

;

Parameter estimator

Subject:

Computer Science and Mathematics - Probability and Statistics

1. Introduction

It is well known that classical linear regression model and time series models are most widely used in statistical inference, including medical treatment, education, finance, science, technology and many other fields. Most of the cases, these models are used for single-valued random variables. In the real world, there are plenty of random phenomenas cannot be characterized by single-valued random variables. Taking the price of a stock on a given day for example, it is clearly unreasonable to use a single-valued data to decribe the stock price. If only single-valued data such as stock closing price or opening price are used, the fluctuation information in the process of stock trading is ignored and the resulting analysis results provided to decision-makers are also one-sided. Moreover, people will pay more attention to the data in a certain range, such as the temperature for a given day, instead of knowing the temperature at a certain time of one day, people care more about the maximum and minimum temperature of one day. In economic forecasting, economists mostly give a prediction range of economic growth rate. In the process of medical impact diagnosis, the impact result is usually a two-dimensional plan, and it is not a single value. Therefor, the interval-valued data are more appropriate and valuable in these cases because they provide more information. Thus it is necessary to consider the interval-valued statistical models and statictical inference problems.

Interval-valued random variables are special set-valued random variables. In mid twentith century, Aumann and Debreu first used set-valued mapping when studying economic phenomena. Aumann [1] gave the integral of set-valued random variables in 1965. Hiai and Umegaki [6] gave the concept of conditional expectation of set-valued random variables in 1978. Lyashenko [11,12] discussed the properties of set-valued random variables in Euclidean space, introduced the definition of set-valued Gaussian random variables, and gave the definition of variance for set-valued random variables. Vitale [16] studied the properties of

D_{p}

distance. In 2005, Xuhua Yang and Shoumei Li [17] gave the definitions of variance and covariance for set-valued random variables under the

D_{p}

distance, and obtained excellent properties. In 2008, Blanco et. al. defined the variance and studied the properties of interval valued random variables under a new distance in [2]. Hess [5], Papageorgiou [14,15], Shoumei Li et. al. [7,8,9] explored the convergence theory of set-valued random variables under different conditions. Molchanov in [13], Shoumei Li et. al. in [10] systematically summarized the theory of set-valued random variables. The above research promoted the development of set-valued random variable theory.

For interval-valued statistical models, Billard and Diday [3] established a linear regression model by using the midpoint of interval-valued random variables in 2000. In 2002, Billard and Diday [4] established linear regression models by using the two endpoints of interval-valued random variables respectively. In 2008, Lima Neto and de Carvalho [19] established linear regression models by using the center and radius of interval-valued random variables. In 2010, Lima Neto and de Carvalho [20] imposed non-negative constraints on the regression coefficients of radius on the basis of [19]. Wang [22] in 2012 proposed the complete information method to deal with the interval-valued linear regression model. Souza [23] introduced the parametrization method to linear regression model in 2017. In 2015, Wang Xun et. al. in [24] used set-valued theory to study linear regression problems, and gave the least square estimaor and the related properties. All the above research works are about the linear regression models of interval-valued random variables. The research on interval-valued spatial regression models and spatial error models are still blank.

As for the single-valued spatial error model, Anselin gave the maximum likelihood estimation method in [26] in 1988. Prucha proposed the generalized moment estimation method in [27] in 1999. In 2020, Yildirim [25] systematically summarized the methods of parameter estimation of spatial error model and proposed a new parameter estimation method based on likelihood equation. Many scholars have studied the classical linear regression and time series models of interval-valued random variables and achieved wonderful research results. We are considering the interval-valued spatial error models.

This paper attempts to extend the classical spatial error model to interval-valued case. The orginazation of this paper is as follows: in Section 2, we mainly introduce the notations and basic concepts of interval-valued random theory. In Section 3, we mainly discuss the interval-valued spatial error model, give the least square estimator of parameter obtain a series of digital characteristics and consistency of parameter estimation; In Section 4, the effectiveness of the method is illustrated by numerical simulation; In Section 5, gives an application of the model by studying the relationship between temperature and latitude in major cities of China.

2. Preliminaries on interval-valued random variables

2.1 $d_{p}$ distance and $D_{p}$ distance

Throughout this paper, we assume that

(Ω, A, μ)

is a complete probability space.

R^{d}

is a d-dimensional Euclidean space,

∥ \cdot ∥

and

〈 \cdot, \cdot 〉

are the norm and inner product in

R^{d}

respectively, and the family of compact convex subsets in

R^{d}

is

K_{k c} (R^{d})

. When

d = 1

,

R^{1}

is abbreviated as

R

, then

K_{k c} (R)

is a family of nonempty bounded closed intervals in

R

, that is

K_{k c} (R) = {A = [\underset{̲}{a}, \bar{a}] : - \infty < \underset{̲}{a} \leq \bar{a} < \infty, \underset{̲}{a}, \bar{a} \in R},

Where,

\underset{̲}{a}

and

\bar{a}

are the left and right endpoints of interval A respectively. In addition, interval A is also denoted as center radius form

A = (c_{A}; r_{A})

, where

c_{A} = (\bar{a} + \underset{̲}{a}) / 2

and

r_{A} = (\bar{a} - \underset{̲}{a}) / 2

are the center and radius of interval A respectively. For any set A, B, the addition and multiplication operations are defined as:

A + B = {a + b : a \in A, b \in B},

k A = {k a : a \in A}, \forall k \in R .

Interval is a special case of set, for

A = [\underset{̲}{a}, \bar{a}] = (c_{1}; r_{1})

,

B = [\underset{̲}{b}, \bar{b}] = (c_{2}; r_{2})

, the addition and multiplication operations are defined as:

\begin{matrix} A + B & = [\underset{̲}{a} + \underset{̲}{b}, \bar{a} + \bar{b}] = (c_{1} + c_{2}; r_{1} + r_{2}), \\ k A & = \{\begin{matrix} [k \underset{̲}{a}, k \bar{a}], & k \geq 0 \\ [k \bar{a}, k \underset{̲}{a}], & k < 0 \end{matrix} = (k c_{1}; | k | r_{1}) . \end{matrix}

Note that if set A does not degenerate to a point,

A - A = A + (- A) \neq {0}

. Then

K_{k c} (R^{d})

is not a linear space with respect to addition and multiplication.

For any set A, B in

K_{k c} (R^{d})

, the subtraction operation is defined as:

A - B = {a - b : a \in A, b \in B}

. As a special case of set value, for interval

A = [\underset{̲}{a}, \bar{a}] = (c_{1}; r_{1}), B = [\underset{̲}{b}, \bar{b}] = (c_{2}; r_{2})

, the definition of subtraction operation is derived as follows:

\begin{matrix} A - B & = [\underset{̲}{a} - \underset{̲}{b}, \bar{a} - \bar{b}] = (c_{1} - c_{2}; r_{1} + r_{2}) . \end{matrix}

The support function of set

A \in K_{k c} (R^{d})

is defined as

s (x, A) = sup_{a \in A} 〈 x, a 〉, x \in R^{d} .

The

d_{p}

distance is defined as follows: for any

1 \leq p < \infty

, the

d_{p}

distance between set A and B is

d_{p} (A, B) = {[\int_{S^{d - 1}} {| s (x, A) - s (x, B) |}^{p} d μ (x)]}^{\frac{1}{p}}, 1 \leq p < \infty,

where,

S^{d - 1}

is the unit sphere of

R^{d}

,

μ

is a measure on

S^{d - 1}

, in particular, take

μ (1) = μ (- 1) = 1

on

S^{0}

. Further, from Yang and Li [17],

(K_{k c} (R^{d}), d_{p})

is a complete separable space. Specially, for interval

A = [\underset{̲}{a}, \bar{a}] = (c_{A}; r_{A})

and

B = [\underset{̲}{b}, \bar{b}] = (c_{B}; r_{B})

, the

d_{p}

distance is

\begin{matrix} d_{p} (A, B) & = {(| \underset{̲}{b} - \underset{̲}{a} |^{p} + {| \bar{b} - \bar{a} |}^{p})}^{\frac{1}{p}} \\ = {[{((c_{B} - c_{A}) - (r_{B} - r_{A}))}^{p} + {((c_{B} - c_{A}) + (r_{B} - r_{A}))}^{p}]}^{\frac{1}{p}} . \end{matrix}

In particular, if

p = 2

,

\begin{matrix} d_{2} (A, B) & = {[{(\underset{̲}{b} - \underset{̲}{a})}^{2} + {(\bar{b} - \bar{a})}^{2}]}^{\frac{1}{2}} \\ = {[{((c_{B} - c_{A}) - (r_{B} - r_{A}))}^{2} + {((c_{B} - c_{A}) + (r_{B} - r_{A}))}^{2}]}^{\frac{1}{2}} \\ = {[2 {(c_{B} - c_{A})}^{2} + {(r_{B} - r_{A})}^{2}]}^{\frac{1}{2}} . \end{matrix}

Call set-valued mapping

F : Ω \to K_{k c} (R^{d})

be a set-valued random variable, if for any closed sets

C \in K_{k c} (R^{d})

,

F^{- 1} (C) = {ω \in Ω : F (ω) \cap C \neq \emptyset} \in A .

Let

U [Ω, K_{k c} (R^{d})]

denote the family of set-valued random variables in

K_{k c} (R^{d})

. The expression of

D_{p}

distance between set-valued random variables

F_{1}

and

F_{2}

is

D_{p} (F_{1}, F_{2}) = {[E d_{p}^{p} (F_{1}, F_{2})]}^{\frac{1}{p}} .

Similarly, for interval-valued random variables, the

D_{p}

distance between interval value random variables

F_{1} = [{\underset{̲}{f}}_{1}, {\bar{f}}_{1}] = (c_{F_{1}}; r_{F_{1}})

and

F_{2} = [{\underset{̲}{f}}_{2}, {\bar{f}}_{2}] = (c_{F_{2}}; r_{F_{2}})

is

\begin{matrix} D_{p} (F_{1}, F_{2}) & = {[E {({\underset{̲}{f}}_{2} - {\underset{̲}{f}}_{1})}^{p} + E {({\bar{f}}_{2} - {\bar{f}}_{1})}^{p}]}^{\frac{1}{p}} \\ = {[E {((c_{F_{2}} - c_{F_{1}}) - (r_{F_{2}} - r_{F_{1}}))}^{p} + E {((c_{F_{2}} - c_{F_{1}}) + (r_{F_{2}} - r_{F_{1}}))}^{p}]}^{\frac{1}{p}} . \end{matrix}

Further, from Yang and Li [17],

(K_{k c} (R^{d}), D_{p})

is a complete separable distance space. In particular, if

p = 2

,

\begin{matrix} D_{2} (F_{1}, F_{2}) & = {[E {({\underset{̲}{f}}_{2} - {\underset{̲}{f}}_{1})}^{2} + E {({\bar{f}}_{2} - {\bar{f}}_{1})}^{2}]}^{\frac{1}{2}} \\ = {[E {((c_{F_{2}} - c_{F_{1}}) - (r_{F_{2}} - r_{F_{1}}))}^{2} + E {((c_{F_{2}} - c_{F 1}) + (r_{F_{2}} - r_{F_{1}}))}^{2}]}^{\frac{1}{2}} . \end{matrix}

2.2 Moment of set-valued random variables

The expectation of set-valued random variable

F \in U [Ω, K_{k c} (R^{d})]

is given by Aumann in [1],

E [F] = \int_{Ω} F d^{¯} = \{\int_{Ω} f d^{¯} : f \in S_{F}\},

where

S_{F}

is the integrable selection set of F, that is,

S_{F} = {f \in L^{p} [Ω, R^{d}] : f (ω) \in F (ω) a . e . (μ)} .

Yang and Li in [17] introduced the variance and covariance of set-valued random variables based on

D_{p}

distance. For set-valued random variable

F \in U [Ω, K_{k c} (R^{d})]

, the variance is defined as follows:

\begin{matrix} Var (F) & = D_{2}^{2} (F, E [F]) \\ = E [d_{2}^{2} (F, E [F])] \\ = E [\int_{S^{d - 1}} {(s (x, F) - s (x, E [F]))}^{2} d μ (x)] . \end{matrix}

For two set-valued random variables

F_{1}, F_{2} \in U [Ω, K_{k c} (R^{d})]

, the covariance is defined as follows

Cov (F_{1}, F_{2}) = E [\int_{S^{d - 1}} (s (x, F_{1}) - s (x, E [F_{1}])) (s (x, F_{2}) - s (x, E [F_{2}])) d μ (x)] .

If

F = (c_{F}; r_{F})

is an interval-valued random variable, then

\begin{matrix} Var (F) = & E {[\underset{̲}{f} - E [\underset{̲}{f}]]}^{2} + E {[\bar{f} - E [\bar{f}]]}^{2} \\ = & E {[(c_{F} - E [c_{F}]) - (r_{F} - E [r_{F}])]}^{2} + E {[(c_{F} - E [c_{F}]) + (r_{F} - E [r_{F}])]}^{2} . \end{matrix}

The covariance of interval-valued random variables

F_{1}, F_{2} \in U [Ω, K_{k c} (R)]

is

\begin{matrix} Cov (F_{1}, F_{2}) & = E [({\underset{̲}{f}}_{1} - E [{\underset{̲}{f}}_{1}]) ({\underset{̲}{f}}_{2} - E [{\underset{̲}{f}}_{2}])] + E [({\bar{f}}_{1} - E [{\bar{f}}_{1}]) ({\bar{f}}_{2} - E [{\bar{f}}_{2}])] \\ = E [(c_{F_{1}} - E [c_{F_{1}}] - (r_{F_{1}} - E [r_{F_{1}}])) (c_{F_{2}} - E [c_{F_{2}}] - (r_{F_{2}} - E [r_{F_{2}}]))] \\ + E [(c_{F_{1}} - E [c_{F_{1}}] + (r_{F 1} - E [r_{F_{1}}])) (c_{F_{2}} - E [c_{F_{2}}] + (r_{F_{2}} - E [r_{F_{2}}]))] . \end{matrix}

Through calculation, we can easily have

\begin{matrix} Var (F) & = 2 E {[c_{F} - E [c_{F}]]}^{2} + 2 E {[r_{F} - E [r_{F}]]}^{2} \\ = 2 Var (c_{F}) + 2 Var (r_{F}), \end{matrix}

\begin{matrix} Cov (F_{1}, F_{2}) = & 2 E [(c_{F_{1}} - E [c_{F_{1}}]) (c_{F_{2}} - E [c_{F_{2}}])] + 2 E [(r_{F_{1}} - E [r_{F_{1}}]) (r_{F_{2}} - E [r_{F_{2}}])] \\ = & 2 Cov (c_{F_{1}}, c_{F_{2}}) + 2 Cov (r_{F_{1}}, r_{F_{2}}) . \end{matrix}

The variance and covariance of interval-valued random variables will be used in Section 3. For more information about the variance and covariance of set-valued random variables, readers can refer to [17].

3. Interval-valued spatial error model

Y = X β + u, u = λ w u + ε, | λ | < 1 .

(3.1)

Consider the classical spatial error model with the following form, where X is the explanatory variable, Y is the explained variable,

β

is the unknown parameter, error term u, and

ε

are single point values, and W is a known

n \times n

space weight matrix,

λ

is a spatial autoregressive coefficient parameter,

the error item

ε \sim N (0, σ^{2} I_{n})

,

I_{n}

is an identity matrix. By transforming, model (3.1) becomes,

\begin{matrix} (I_{n} - λ W) Y = (I_{n} - λ W) X β + ε, \end{matrix}

(3.2)

denoted by

\begin{matrix} Y_{λ} = (I_{n} - λ W) Y, \\ X_{λ} = (I_{n} - λ W) X, \end{matrix}

Model (3.2) can be expressed as

Y_{λ} = X_{λ} β + ε,

E (Y_{λ}) = X_{λ} β .

(3.3)

Now we extend the above classical single-valued model to interval-valued case.

Definition 3.1 If

Y_{λ} = {(Y_{λ 1}, Y_{λ 2}, \dots, Y_{λ n})}^{T}

is the n-dimensional vector of interval-valued observations,

X_{λ} = {(x_{λ i j})}_{n \times p}

is the

n \times p

single point valued design matrix,

β = (β_{1}, β_{2}, \dots, β_{p})

is a p-dimensional interval-valued parameter vector, then model (3.3), is called interval-valued space error model.

Next, we give the algorithm for multiplication of the matrix and interval values.

Definition 3.2 Let

A_{i} = [\underset{̲}{a_{i}}, \bar{a_{i}}] = (c_{i}; r_{i}), i = 1, \dots, p

be the interval in

K_{k c} (R)

, the interval value vector

A = {(A_{1}, A_{2}, \dots, A_{p})}^{T}

is multiplied by any

n \times p

dimensional matrix

{(m_{i j})}_{n \times p}, i = 1, 2, \dots, n; j = 1, 2, \dots, p

, the algorithm is defined as follows:

\begin{matrix} {(m_{i j})}_{n \times p} A & = (\begin{matrix} m_{11} A_{1} + \dots + m_{1 p} A_{p} \\ ⋮ \\ m_{n 1} A_{1} + \dots + m_{n p} A_{p} \end{matrix}) \\ = (\begin{matrix} m_{11} (c_{1}; r_{1}) + \dots + m_{1 p} (c_{p}; r_{p}) \\ ⋮ \\ m_{n 1} (c_{1}; r_{1}) + \dots + m_{n p} (c_{p}; r_{p}) \end{matrix}) \\ = (\begin{matrix} m_{11} [\underset{̲}{a_{1}}, \bar{a_{1}}] + \dots + m_{1 p} [\underset{̲}{a_{p}}, \bar{a_{p}}] \\ ⋮ \\ m_{n 1} [\underset{̲}{a_{1}}, \bar{a_{1}}] + \dots + m_{n p} [\underset{̲}{a_{p}}, \bar{a_{p}}] \end{matrix}) . \end{matrix}

For the general single-valued linear model, the idea of the least squares estimation method is to minimize the sum of the squares of the residuals. We shall use the same mathematical idea here. For interval-valued spatial error model, the least square estimation of interval-valued unknown parameter

β

is to minimize

d_{2}^{2} (Y_{λ}, X_{λ} β)

under the definition of

d_{2}

distance

\begin{matrix} d_{2}^{2} (Y_{λ}, x_{λ} β) = & \sum_{i = 1}^{n} d_{2}^{2} (Y_{λ i}, x_{λ i 1} β_{1} + x_{λ i 2} β_{2} + \dots + x_{λ i p} β_{p}) \\ = & \sum_{i = 1}^{n} [(c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}}) \\ {- (r_{Y_{λ i}} - |x_{λ i 1}| r_{β_{1}} - \dots - |x_{λ i p}| r_{β_{p}})]}^{2} \\ + \sum_{i = 1}^{n} [(c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}}) \\ {+ (r_{Y_{λ i}} - |x_{λ i 1}| r_{β_{1}} - \dots - |x_{λ i p}| r_{β_{p}})]}^{2} \\ = & 2 \sum_{i = 1}^{n} [{(c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}})}^{2} \\ + {(r_{Y_{λ i}} - |x_{λ i 1}| r_{β_{1}} - \dots - |x_{λ i p}| r_{β_{p}})}^{2}], \end{matrix}

where

c_{m}

and

r_{m}

represent the center and radius of interval value m respectively. The above formula is the quadratic function of

c_{β_{j}}

and

r_{β_{j}}

, and

d_{2}^{2} (Y_{λ}, X_{λ} β) \geq 0

, so there is a minimum value.

Next, calculate the partial derivatives of

c_{β_{j}}

and

r_{β_{j}}

respectively

\begin{matrix} \{\begin{matrix} \frac{\partial d_{2}^{2} (Y_{λ}, X_{λ} β)}{\partial c_{β_{j}}} = 0 \\ \frac{\partial d_{2}^{2} (Y_{λ}, X_{λ} β)}{\partial r_{β_{j}}} = 0 \end{matrix}, j = 1, 2, \dots, p . \end{matrix}

that is,

\{\begin{matrix} \sum_{i = 1}^{n} (c_{Y_{λ i}} - x_{λ i 1} c_{β_{1}} - \dots - x_{λ i p} c_{β_{p}}) (- x_{λ i j}) = 0 \\ \sum_{i = 1}^{n} (r_{Y_{λ i}} - | x_{λ i 1} | r_{β_{1}} - \dots - | x_{λ i p} | r_{β_{p}}) (- | x_{λ i j} |) = 0 . \end{matrix}

The regular equation is:

\begin{matrix} \{\begin{matrix} X_{λ}^{T} c_{Y_{λ}} = X_{λ}^{T} X_{λ} c_{β} \\ | X_{λ} |^{T} r_{Y_{λ}} = | X_{λ} |^{T} | X_{λ} | r_{β}, \end{matrix} \end{matrix}

where

| X_{λ} | = {(|x_{i j}|)}_{n \times p}

. The parameter estimation of the interval-valued spatial error model can be obtained by solving the regular equation. The following is the result about the rank of

X_{λ}

.

Lemma 3.3 If

r k (X) = p

, then

r k (X_{λ}) = p

.

Proof Since

\begin{matrix} r k (X_{λ}) & = r k ((I_{n} - λ w) X) \\ \leq min (r k (I_{n} - λ w), r k (X)) \\ \leq r k (X) \\ = p \end{matrix}

and

r k ((I_{n} - λ w) X) \geq r k (I_{n} - λ w) + r k (X) - n = p,

it has

r k ((I_{n} - λ w) X) = r k (X_{λ}) = r k (X) = p .

The result is proved. □

Based on Lemma 3.3, suppose

r k (| X_{λ} |) = p

, then the estimator of interval-valued spatial error model can be obtained by solving the regular equation, which is shown in the following theorem.

Theorem 3.3 Under the condition of Lemma 3.3, the least squares estimation of interval-valued spatial error model is unique, which is denoted as

\begin{matrix} {\hat{β}}_{L S} (λ) & = ({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} c_{Y_{λ}}; (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} | X_{λ} |^{T} r_{Y_{λ}}) \\ = ({(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) c_{Y}; \\ {(| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - {λ W | | X |)}^{- 1} {| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - λ W | r_{Y}) . \end{matrix}

After obtaining the estimation form of unknown parameter

β

, we then discuss the properties. First, consider the unbiasedness of

{\hat{β}}_{L S} (λ)

.

Theorem 3.4 The least squares estimate

{\hat{β}}_{L S} (λ)

is an unbiased estimate of

β

.

Proof By Theorem 3.3,

\begin{matrix} E ({\hat{β}}_{L S} (λ) & = ({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} E [c_{Y_{λ}}]; (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} | X_{λ} |^{T} E [r_{Y_{λ}}]) \\ = E ({(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) c_{Y}; \\ {(| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - {λ W | | X |)}^{- 1} {| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - λ W | r_{Y}) \\ = ({(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) E (c_{Y}); \\ {(| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - {λ W | | X |)}^{- 1} {| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - λ W | E (r_{Y})) \\ = ({(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X c_{β}; \\ {(| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - {λ W | | X |)}^{- 1} {| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - λ W | | X | r_{β}) \\ = (c_{β}; r_{β}) = β . \end{matrix}

The result is proved. □

For the interval-valued spatial error model, when

r k (X_{λ}) = r k (|X_{λ}|) = p

, the covariance of

{\hat{β}}_{L S} (λ)

can be obtained, as shown in the following result.

Theorem 3.5 If

r k (X_{λ}) = r k (|X_{λ}|) = p, E (Y_{λ}) = X_{λ} β

,

C o v (c_{Y_{λ}}) = c_{σ^{2}} I_{n}

,

C o v (r_{Y_{λ}})

= r_{σ^{2}} I_{n}

, then the covariance matrix of

{\hat{β}}_{L S} (λ)

is

(1)

i \neq j

,

\begin{matrix} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)) & = & 2 c_{σ^{2}} {({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(i)} {(X_{λ} {(X_{λ}^{T} X_{λ})}^{- 1})}_{(j)} \\ + 2 r_{σ^{2}} {((| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T})}_{(i)} {(| X_{λ} | (| X_{λ} |^{T} | X_{λ} {|)}^{- 1})}_{(j)}, \end{matrix}

(2)

i = j

,

\begin{matrix} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)) & = & 2 c_{σ^{2}} {(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} \\ + 2 r_{σ^{2}} {({| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - λ W | | X |)}^{- 1}, \end{matrix}

where

{\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)

represent the ith and jth element of

{\hat{β}}_{L S} (λ)

respectively, and

A_{(i)}, A_{(j)}

represent the ith, jth rows of matrix A respectively.

Proof For the ith and jth element of

{\hat{β}}_{L S} (λ)

, if

i \neq j

, it has

\begin{matrix} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(j)} (λ)) & = C o v {({({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(i)} c_{Y_{λ}}; ((| X_{λ} |^{T} | X_{λ} {|)}^{- 1} | X_{λ} {|^{T})}_{(i)} r_{Y_{λ}}), \\ ({({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(j)} c_{Y_{λ}}; ((| X_{λ} |^{T} | X_{λ} {|)}^{- 1} | X_{λ} {|^{T})}_{(j)} r_{Y_{λ}})} \\ = 2 C o v ({({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(i)} c_{Y_{λ}}, {({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(j)} c_{Y_{λ}}) \\ + 2 C o v ({({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(i)} r_{Y_{λ}}, {({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(j)} r_{Y_{λ}}) \\ = 2 {({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(i)} C o v (c_{Y_{λ}}) {({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(j)}^{T} \\ + 2 {((| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T})}_{(i)} C o v (c_{Y_{λ}}) {(| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T})}_{(j)}^{T} \\ = 2 c_{σ^{2}} {({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T})}_{(i)} {(X_{λ} {(X_{λ}^{T} X_{λ})}^{- 1})}_{(j)} \\ + 2 r_{σ^{2}} {((| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T})}_{(i)} {(| X_{λ} | (| X_{λ} |^{T} | X_{λ} {|)}^{- 1})}_{(j)} . \end{matrix}

When

i = j

, it has

\begin{matrix} C o v ({\hat{β}}_{L S}^{(i)} (λ), {\hat{β}}_{L S}^{(i)} (λ)) & = 2 c_{σ^{2}} ({(X_{λ}^{T} X_{λ})}^{- 1} + 2 r_{σ^{2}} (| X_{λ} |^{T} | X_{λ} {|)}^{- 1}) \\ = 2 c_{σ^{2}} {(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} \\ + 2 r_{σ^{2}} {({| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - λ W | | X |)}^{- 1} . \end{matrix}

The result is proved. □

Next we discuss the estimation of error

ε

and error variance. We mainly consider the expectation and covariance of interval-valued error estimation.

Theorem 3.6 The error estimator

\hat{ε}

can be obtained from

Y_{λ} - X_{λ} {\hat{β}}_{L S}

, and its expectation and variance are as follows:

(1)

E (\hat{ε}) = 0,

(2)

C o v (\hat{ε}) = 2 c_{σ^{2}} (I_{n} - P_{x_{λ}}) + 2 r_{σ^{2}} (I_{n} + P_{| x_{λ} |})

,

where

P_{X_{λ}} = X_{λ} {(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T}, P_{| X_{λ} |} = | X_{λ} | (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T}

.

Proof

(1) Since

\begin{matrix} \hat{ε} & = Y_{λ} - X_{λ} {\hat{β}}_{L S} (λ) \\ = (c_{Y_{λ}}; r_{Y_{λ}}) - X_{λ} ({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} c_{Y_{λ}}; (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T} r_{Y_{λ}}) \\ = (c_{Y_{λ}}; r_{Y_{λ}}) - (X_{λ} {(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} c_{Y_{λ}}; | X_{λ} | (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T} r_{Y_{λ}}) \\ = (c_{Y_{λ}}; r_{Y_{λ}}) - (c_{Y_{λ}}; r_{Y_{λ}}) \\ = (0; 2 r_{Y_{λ}}), \end{matrix}

it has

\begin{matrix} E (\hat{ε}) & = E (Y_{λ} - X_{λ} {\hat{β}}_{L S} (λ)) \\ = ((X_{λ} c_{β}; | X_{λ} | r_{β}) - (X_{λ} {(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} X_{λ} c_{β}; | X_{λ} | (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} | X_{λ} |^{T} | X_{λ} | r_{β}) \\ = (X_{λ} c_{β}; | X_{λ} | r_{β}) - (X_{λ} c_{β}; | X_{λ} | r_{β}) \\ = (0; 2 | X_{λ} | r_{β}) . \end{matrix}

(2) On the other hand,

\begin{matrix} \hat{ε} & = Y_{λ} - X_{λ} {\hat{β}}_{L S} (λ) \\ = (c_{Y_{λ}}; r_{Y_{λ}}) - X_{λ} ({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} c_{Y_{λ}}; (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T} r_{Y_{λ}}) \\ = ((I_{n} - X_{λ} {(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T}) c_{Y_{λ}}; (I_{n} + | X_{λ} | (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} | X_{λ} |^{T}) r_{Y_{λ}}) \\ = ((I_{n} - P_{X_{λ}}) c_{Y_{λ}}; (I_{n} + P_{| X_{λ} |}) r_{Y_{λ}}) . \end{matrix}

Then the ith element of

\hat{ε}

is

({(I_{n} - P_{X_{λ}})}_{(i)} c_{Y_{λ}}; {(I_{n} + P_{| X_{λ} |})}_{(i)} r_{Y_{λ}}) .

Thus when

i \neq j

,

\begin{matrix} C o v (\hat{ε_{i}}, \hat{ε_{j}}) & = C o v {({(I_{n} - P_{X_{λ}})}_{(i)} c_{Y_{λ}}; {(I_{n} + P_{| X_{λ} |})}_{(i)} r_{Y_{λ}}), \\ ({(I_{n} - P_{X_{λ}})}_{(j)} c_{Y_{λ}}; {(I_{n} + P_{| X_{λ} |})}_{(j)} r_{Y_{λ}})} \\ = 2 C o v ({(I_{n} - P_{X_{λ}})}_{(i)} c_{Y_{λ}}, {(I_{n} - P_{X_{λ}})}_{(j)} c_{Y_{λ}}) + \\ 2 C o v ({(I_{n} + P_{| X_{λ} |})}_{(i)} r_{Y_{λ}}, {(I_{n} + P_{| X_{λ} |})}_{(j)} r_{Y_{λ}}) \\ = 2 {(I_{n} - P_{X_{λ}})}_{(i)} C o v (c_{Y_{λ}}) {(I_{n} - P_{X_{λ}})}_{(j)} + \\ 2 {(I_{n} - P_{| X_{λ} |})}_{(i)} C o v (r_{Y_{λ}}) {(I_{n} + P_{| X_{λ} |})}_{(j)}, \end{matrix}

where

A_{(i)}, A_{(j)}

respectively represent the ith and jth rows of matrix A. When

i = j

,

\begin{matrix} C o v (\hat{ε}) & = 2 (I_{n} - P_{X_{λ}}) C o v (c_{Y_{λ}}) (I_{n} - P_{X_{λ}}) + \\ 2 (I_{n} - P_{| X_{λ} |}) C o v (r_{Y_{λ}}) (I_{n} + P_{| X_{λ} |}) \\ = 2 c_{σ^{2}} (I_{n} - P_{X_{λ}}) + 2 r_{σ^{2}} (I_{n} + P_{| X_{λ} |}) . \end{matrix}

The result is proved. □

Next, we consider the estimation of

c_{σ^{2}} = C o v (c_{Y_{λ}})

and

r_{σ^{2}} = C o v (r_{Y_{λ}})

. Denote

{\hat{c}}_{ε} = (I_{n} - P_{X_{λ}}) c_{Y_{λ}}, {\hat{r}}_{ε} = (I_{n} + P_{X_{λ}}) r_{Y_{λ}}

.

Theorem 3.7

{\hat{c}}_{σ^{2}} = \frac{{\hat{c}}_{ε}^{T} {\hat{c}}_{ε}}{n - p}

and

{\hat{r}}_{σ^{2}} = \frac{{\hat{c}}_{ε}^{T} {\hat{c}}_{ε}}{n + p}

are unbiased estimators of

c_{σ^{2}}

and

r_{σ^{2}}

respectively.

Proof Since

(I_{n} - P_{X_{λ}})

is an idempotent matrix, it has

\begin{matrix} {\hat{c}}_{ε}^{T} {\hat{c}}_{ε} & = {((I_{n} - P_{X_{λ}}) c_{Y_{λ}})}^{T} ((I_{n} - P_{X_{λ}}) c_{Y_{λ}}) \\ = c_{Y_{λ}}^{T} (I_{n} - P_{X_{λ}}) c_{Y_{λ}} . \end{matrix}

So

\begin{matrix} E [{\hat{c}}_{ε}^{T} {\hat{c}}_{ε}] & = E [c_{Y_{λ}}^{T} (I_{n} - P_{X_{λ}}) c_{Y_{λ}}] \\ = {(X_{λ} c_{β})}^{T} (I_{n} - P_{X_{λ}}) (X_{λ} c_{β}) + t r (I_{n} - P_{X_{λ}}) C o v (c_{Y_{λ}}) \\ = c_{σ^{2}} t r (I_{n} - P_{X_{λ}}) \\ = c_{σ^{2}} (n - p) . \end{matrix}

Then the estimator of

c_{σ^{2}}

is gived as

{\hat{c}}_{σ^{2}} = \frac{{\hat{c}}_{ε}^{T} {\hat{c}}_{ε}}{n - p} .

So

E ({\hat{c}}_{σ^{2}}) = E (\frac{{\hat{c}}_{ε}^{T} {\hat{c}}_{ε}}{n - p}) = \frac{1}{n - p} c_{σ^{2}} (n - p) = c_{σ^{2}} .

Since

(I_{n} + P_{| X_{λ} |})

is an idempotent matrix, so

\begin{matrix} {\hat{r}}_{ε}^{T} {\hat{r}}_{ε} & = {((I_{n} + P_{| X_{λ} |}) r_{Y_{λ}})}^{T} ((I_{n} + P_{| X_{λ} |}) r_{Y_{λ}}) \\ = r_{Y_{λ}}^{T} (I_{n} + P_{| X_{λ} |}) r_{Y_{λ}} . \end{matrix}

Furthermore

\begin{matrix} E [{\hat{r}}_{ε}^{T} {\hat{r}}_{ε}] & = E [r_{Y_{λ}}^{T} (I_{n} + P_{| X_{λ} |}) r_{Y_{λ}}] \\ = (| X_{λ} | r_{β})^{T} (I_{n} + P_{| X_{λ} |}) (| X_{λ} | r_{β}) + t r (I_{n} + P_{| X_{λ} |}) C o v (r_{Y_{λ}}) \\ = r_{σ^{2}} t r (I_{n} + P_{| X_{λ} |}) \\ = r_{σ^{2}} (n + p) . \end{matrix}

The estimator of

r_{σ^{2}}

is given as

{\hat{r}}_{σ^{2}} = \frac{{\hat{r}}_{ε}^{T} {\hat{r}}_{ε}}{n + p} .

So

E ({\hat{r}}_{σ^{2}}) = \frac{1}{n - p} r_{σ^{2}} (n - p) = r_{σ^{2}} .

The result is proved. □

In the following, we discuss the independence of

{\hat{β}}_{L S} = ({\hat{c}}_{β}; {\hat{r}}_{β})

and

\hat{σ^{2}} = ({\hat{c}}_{σ^{2}}; {\hat{r}}_{σ^{2}})

.

Theorem 3.8

{\hat{c}}_{σ^{2}}

and

{\hat{c}}_{β}

are independent,

{\hat{r}}_{σ^{2}}

and

{\hat{r}}_{β}

are independent.

Proof Since

\begin{matrix} \hat{σ^{2}} & = ({\hat{c}}_{σ^{2}}; {\hat{r}}_{σ^{2}}) \\ = (\frac{c_{Y_{λ}}^{T} (I_{n} - P_{X_{λ}}) c_{Y_{λ}}}{n - p}; \frac{r_{Y_{λ}}^{T} (I_{n} + P_{| X_{λ} |}) r_{Y_{λ}}}{n - p}), \end{matrix}

\begin{matrix} {\hat{β}}_{L S} (λ) & = ({\hat{c}}_{β}; {\hat{r}}_{β}) \\ = ({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} c_{Y_{λ}}; (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T} r_{Y_{λ}}), \end{matrix}

it can be seen that

{\hat{c}}_{σ^{2}}

is the quadratic form of

c_{Y_{λ}}

,

{\hat{c}}_{β}

is the linear form of

c_{Y_{λ}}

, and

c_{Y_{λ}} \sim N (0, c_{σ^{2}} I_{n}) .

According to the independence theorem of quadratic form and linear form of normal variables, it is necessary to prove that they are independent of each other, that is, the product of linear part, variance part and quadratic part of normal variables is 0. Then

\begin{matrix} {(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} c_{σ^{2}} I_{n} (I_{n} - P_{X_{λ}}) & = c_{σ^{2}} I_{n} ({(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} - {(X_{λ}^{T} X_{λ})}^{- 1} X_{λ}^{T} P_{X_{λ}}) = 0 . \end{matrix}

Similarly,

{\hat{r}}_{σ^{2}}

is the quadratic form of

r_{Y_{λ}}

,

{\hat{r}}_{β}

is the linear form of

r_{Y_{λ}}

, and

r_{Y_{λ}} \sim N (0, r_{σ^{2}} I_{n}),

then

\begin{matrix} (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T} r_{σ^{2}} I_{n} (I_{n} - P_{| X_{λ} |}) & = r_{σ^{2}} I_{n} ((| X_{λ} |^{T} | X_{λ} {|)}^{- 1} {| X_{λ} |}^{T} - \\ (| X_{λ} |^{T} | X_{λ} {|)}^{- 1} | X_{λ} |^{T} P_{| X_{λ} |}) \\ = 0 . \end{matrix}

Thus

{\hat{c}}_{σ^{2}}

and

{\hat{c}}_{β}

are independent,

{\hat{r}}_{σ^{2}}

and

{\hat{r}}_{β}

are independent. □

Theorem 3.9 In the sense of

D_{2}

distance, the sufficient condition for the strong consistentancy of

{\hat{β}}_{L S} (λ)

for estimating

β

is:

lim_{n \to \infty} (S_{n}^{- 1} + | S_{n} |^{- 1}) = 0

where,

S_{n} = X_{λ}^{T} X_{λ}

,

| S_{n} | = | X_{λ} |^{T} | X_{λ} | .

Proof According to Theorem 3.4,

{\hat{β}}_{L S} (λ)

is an unbiased estimate of

β

, namely,

E ({\hat{β}}_{L S} (λ)) = β .

Moreover,

\begin{matrix} V a r ({\hat{β}}_{L S} (λ)) & = 2 c_{σ^{2}} {(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} + \\ 2 r_{σ^{2}} {(| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - {λ W | | X |)}^{- 1}), \end{matrix}

From

lim_{n \to \infty} (S_{n}^{- 1} + | S_{n} |^{- 1}) = 0,

it has

\begin{matrix} lim_{n \to \infty} V a r ({\hat{β}}_{L S} (λ)) & = lim_{n \to \infty} (2 c_{σ^{2}} {(X^{T} {(I_{n} - λ W)}^{T} (I_{n} - λ W) X)}^{- 1} + \\ 2 r_{σ^{2}} {(| X |}^{T} | I_{n} {- λ W |}^{T} | I_{n} - {λ W | | X |)}^{- 1}) \\ = lim_{n \to \infty} (2 c_{σ^{2}} S_{n}^{- 1} + 2 r_{σ^{2}} | S_{n} |^{- 1}) \\ = 0 . \end{matrix}

Thus

\begin{matrix} lim_{n \to \infty} V a r ({\hat{β}}_{L S} (λ)) & = lim_{n \to \infty} D_{2}^{2} ({\hat{β}}_{L S} (λ), E ({\hat{β}}_{L S} (λ))) = 0 . \end{matrix}

Therefore, in the sense of

D_{2}

metric,

{\hat{β}}_{L S} (λ)

is a strong consistent estimate of

β

. □

4. Numerical simulation

In this part, the parameter estimation process of interval-valued spatial error model is further explored by numerical simulation. Based on the

d_{p}

distance of the interval value, mean square error of the estimator is calculated to measure the goodness of the estimation.

Based on Equation 3.3, the interval-valued spatial error model in matrix form is expanded as follows:

(\begin{matrix} y_{1 λ} \\ y_{2 λ} \\ ⋮ \\ y_{n λ} \end{matrix}) = (\begin{matrix} 1 & x_{1 λ} \\ 1 & x_{2 λ} \\ ⋮ & ⋮ \\ 1 & x_{n λ} \end{matrix}) (\begin{matrix} β_{1} \\ β_{2} \end{matrix}) + (\begin{matrix} ε_{1} \\ ε_{2} \\ ⋮ \\ ε_{n} \end{matrix}),

where

\begin{matrix} Y_{λ} = (y_{1 λ}, y_{2 λ}, \dots, y_{n λ}) = (I_{n} - λ W) Y \\ X_{λ} = (x_{1 λ}, x_{2 λ}, \dots, x_{n λ}) = (I_{n} - λ W) X . \end{matrix}

I_{n}

is the unit matrix of dimension n,

λ

is the spatial autocorrelation coefficient. and W as the spatial weight matrix. Using the first-order adjacency method, assuming that n samples are arranged in one font, the spatial weight matrix can be written as follows:

\begin{matrix} W & = (\begin{matrix} 0 & 1 & 0 & \dots & 0 & 0 \\ 1 & 0 & 1 & \dots & 0 & 0 \\ 0 & 1 & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & 1 \\ 0 & 0 & 0 & \dots & 1 & 0 \end{matrix}) . \end{matrix}

The true values of the given interval value parameters are

β_{1} = [1, 2] = (1.5; 0.5), β_{2} = [1.5, 2.5] = (2; 0.5)

respectively. Then,

\begin{matrix} y_{i λ} & = β_{1} + x_{i λ} β_{2} + ε_{i} \\ = (1.5 + 2 x_{i λ} + c_{ε_{i}}; 0.5 + 0.5 x_{i λ} + r_{ε_{i}}), \end{matrix}

where, the error term follows the normal distribution, that is

c_{ε_{i}}, r_{ε_{i}} \sim N (0, 0 . 3^{2}) .

First, take

n = 100

, the explanatory variable X is generated according to the rules of

x_{i} = 0.5 + 0.1 i, i = 1, 2, \dots, 100

, and a set of values will be obtained in each simulation experiment.

The scatter point in Figure 1 is the

y_{i λ}

data generated by a simulation experiment, and the two fitting lines are the corresponding interval value spatial error model function:

Y_{λ} = [1.0653, 2.0171] + [1.4955, 2.5009] X_{λ} .

Repeat the above process for 500 times to obtain the average value of

\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})

as follows:

\bar{\hat{β}} = ([1.0004, 2.0019], [1.4999, 2.4999]) .

Next, the mean square error MSE of the parameter estimation obtained by the model is calculated as one of the criteria to measure the goodness of the estimation. The calculation method is based on interval value

d_{p}

distance:

\begin{matrix} M S E & = \frac{1}{n} \sum_{i = 1}^{n} d_{2}^{2} (β, \hat{β}) \\ = \frac{1}{500} \sum_{i = 1}^{500} d_{2}^{2} (β, \hat{β}) \end{matrix}

By calculation, the MSE of samples

{\hat{β}}_{1}, {\hat{β}}_{2}

are 0.0165 and 0.0006 respectively.

Similarly, set

n = 200, n = 300

, repeat the above simulation process for 500 times respectively, and obtain the average value of

\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})

and the sample mean square error MSE respectively. The specific simulation results are summarized in Table 1 and Table 2.

It can be seen from the simulation results that when

n = 100

, the obtained parameter estimation is very close to the real value. With the increase of sample size, the obtained estimation is closer to the real value, and the sample mean square error MSE is smaller and smaller.

5. Empirical analysis

5.1 Data preparation

In this paper, we select the index data of 31 cities from 31 provinces and autonomous regions in China (not considered temporarily due to geographical factors, Hong Kong, Macao and Taiwan) to illustrate the application of interval-valued spatial error model in practice. This section mainly studies factors such as latitude, maximum and minimum temperatures and temperature difference. Due to the vast territory of China, the temperature and latitude of different cities in a region are different. Therefore, the provincial capital cities in various regions are selected as the research object. The temperature data is the minimum and maximum temperature on July 8, 2021. The temperature data comes from Baidu weather forecast, and the latitude data comes from convenient query website( https://jingweidu.bmcx.com/ ), the data indicators are summarized in Table 3.

Before modeling and analysis, spatial autocorrelation test is carried out for the data. First, the spatial autocorrelation test is based on the spatial weight matrix. In this paper, the distance based representation is selected for the selection of the spatial weight matrix. The spatial weight matrix is obtained by the open source software geoda. After obtaining the spatial weight matrix, the matrix is standardized. Of course, there are many representation methods of spatial weight matrix, and distance representation can also be selected, which will not be repeated here. Spatial autocorrelation is based on the explained variable. The explained variable in this paper consists of the interval value of the lowest temperature and the highest temperature. Therefore, in the spatial autocorrelation test, the general linear model can be used to model the upper and lower endpoints of interval values respectively, and the center values of the highest and lowest temperatures can be used for spatial autocorrelation test.

One of the main methods of spatial autocorrelation test is to conduct global or local Moran’s I test. As can be seen from the table below, the global Moran’s I indexes are 0.4193 and 0.3013 respectively, and the tested p values are less than the significant level of 0.05. Therefore, the original hypothesis is rejected and it is considered that the maximum and minimum temperatures of 31 provinces, cities and autonomous regions in China have a certain spatial autocorrelation.

Figure 2 and Figure 3 are the scatter diagrams of global Moran’s I coefficient. It can be seen that the highest and lowest temperatures of 31 regions in China have positive autocorrelation, that is, the trend of high high and low low.

5.2 Parameter estimation of interval-valued spatial error model

After data preparation and spatial autocorrelation test, the interval-value spatial error model is established. Among them, the dependent variable is air temperature, the interval value data is composed of the lowest and highest air temperatures, and the explanatory variable is latitude. The parameter estimation is carried out by the parameter estimation method of interval-value spatial error model in Section 3. It is assumed that the air temperature and latitude conform to the interval-valued linear model, that is:

E (y_{i λ}) = β_{1} + x_{i λ} β_{2}

As the method mentioned in Section 4, by the minimum temperature and the maximum temperature we can obtained

λ

(

λ_{1} = 0.1187, λ_{2} = 0.1429

) respectively. Then take the average value as the

λ

(

λ = \frac{λ_{1} + λ_{2}}{2} = 0.1308

) of the interval-valued spatial error model. GeoDa software is used to obtain the spatial weight matrix of 31 regions. Take the inverse square of the matrix elements, and then standardize to obtain the final spatial weight matrix..

The parameter estimation results are shown in Table 5, which shows that:

\hat{β} = (\hat{β_{1}}, {\hat{β}}_{2}) = ([24.7501, 29.5478], [- 0.1681, - 0.0619])

The final interval-valued spatial error model is:

Y_{λ} = [24.7501, 29.5478] + [- 0.1681, - 0.0619] X_{λ} .

In Figure 4, the abscissa and ordinate are latitude and temperature (maximum temperature and minimum temperature) respectively, it can be seen that the temperature and latitude of the 31 cities in China are negatively correlated. With the increasing of latitude, the temperature has a certain downward trend. At the same time, it can also be seen that the temperature difference (the difference between the maximum temperature and the minimum temperature) tends to expand with the increase of latitude, which is also consistent with the large diurnal temperature difference in northwest and northeast of China. And small temperature difference between day and night in central region of China, southeast and southwest of China.

Acknowledgments

This work is supported by the National Social Science Fund of China No.19BTJ017.

References

Aumann R J. Integrals of set-valued functions[J]. Journal of Mathematical Analysis and Applications, 1965, 12(1): 1-12.
Blanco Fernandez A, Corral N, Gonzalez-Redriguez G, Lubiano M A. Some properties of the dk-variance for interval-valued sets[J]. 2008,D.Dubois et al. (Eds.): Soft Methods for Hand. Var. And Imprecision,ASC48:331-337.
Billard L, Diday E. Regression analysis for interval-valued data. Conference of the International Federation of Classifification Societies[C]. Springer-Verlag, 2000: 369-374.
Billard L, Diday E. Symbolic regression analysis[J]. Studies in Classification Data Analysis and Knowledge Organization, 2002, 281- 288.
Hess C. On multivalued martingales whose values may be unbounded: martingale selectors and Mosco convergence[J]. Journal of Multivariate Analysis, 1991, 39: 175-201. [CrossRef]
Hiai F, Umegaki H. Integrals, conditional expectations, and martingales of multivalued functions[J]. Journal of Multivariate Analysis, 1977, 7(1): 149-182. [CrossRef]
Li S, Li J, Li X. Stochastic integral with respect to set-valued square integrable martingales [J]. Journal of Mathematical Analysis and Applications, 2010, 370: 659-671. [CrossRef]
Li S, Ogura Y. Convergence of set valued sub- and supermartingales in the KuratowskiMosco sense[J]. The Annuals of Probability, 1998, 26: 1384-1402.
Li S, Ogura Y. Convergence of set-valued and fuzzy-valued martingales[J]. Fuzzy sets and systems, 1999, 101: 453-461. [CrossRef]
Li S, Ogura Y, Kreinovich V. Limit Theorems and Applications of Set-Valued Random Variables[M]. Netherlands: Kluwer academic publishers(Springer), 2002.
Lyashenko N N. Limit theorems for sums of independent compact random subsets of a Euclidean space[J]. Journal of Mathematical Sciences, 1982, 20(3): 2187-2196. [CrossRef]
Lyashenko N N. Statistics of random compacts in Euclidean space[J]. Journal of Mathematical Sciences, 1983, 21(1): 76-92. [CrossRef]
Molchanov I S. Theory of Random Sets[M](Springer), 2005.
al Papageorgiou N S. On the theory of Banach space valued multifunction. 2. set valued martingales and set valued measures[J]. Journal of Multivariate Analysis, 1985, 17: 207- 227. [CrossRef]
Papageorgiou N S.On the conditional expectation and convergence properties of random sets[J]. Transactions of the American Mathematical Society, 1995, 347: 2495-2515.
Vitale R. Lp metrics for compact, convex sets[J]. Journal of Approximation Theory, 1985, 45(3): 280-287.
Yang X, Li S. The Dp-metric space of set-valued random variables and its application to covariances[J]. International Journal of Innovative Computing, Information and Control, 2005, 1: 73-82.
Zhang Wenxiu, Li Shoumei, Wang Zhenpeng, Gao Yong. Introduction to set-valued stochastic processes[M].Beijing: Science Press,2007.
Lima Neto E, de Carvalho F, Centre and range method for fitting a linear regression model to symbolic interval data[J].Computational Statistics and Data Analysis,2008:52, 1500-1515. [CrossRef]
Lima Neto E, de Carvalho F, Constrained linear regression models for symbolic interval-valued variables[J], Computational Statistics and Data Analysis, 2010:54,333-347. [CrossRef]
Tang Nana. Linear regression and autoregressive time series models with constraints [D], Beijing University of technology,2017.
Wang H, Guan R, Wu J. Linear regression of interval-valued data based on complete information in hypercubes[J]. Journal of systems science and systems engineering (English Edition),2012, 21(4): 422-442. [CrossRef]
Souza L, Souza R, Amaral G, Filho T. A parametrized approach for linear regression of interval data[J]. Knowledge-Based Systems, 2017, 131: 149-159. [CrossRef]
Wang X, Li S, Denoeux T. Interval-valued linear model[J]. International journal of computational intelligence systems, 2015, 8(1): 114-127.
Yildirim V, Kantar Y M. Robust estimation approach for spatial error model[J]. Journal of Statistical Computation and Simulation, 2020, 90(3):1-21. [CrossRef]
Anselin L. Spatial Econometrics: Methods and Models [M],1988.
Prucha K I R. A generalized moments estimator for the autoregressive parameter in a spatial model[J]. International Economic Review, 2010, 40(2):509-533. [CrossRef]

Figure 1. numerical simulation of interval valued spatial error model

Figure 2. Scatter plot of global Moran’s I coefficient of minimum temperature

Figure 3. Scatter plot of global Moran’s I coefficient of maximum temperature

Figure 4. Interval value spatial error model of temperature and latitude

Table 1. Average of

\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})

Table 1. Average of

\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})

$λ = 0.0727$	n=100	n=200	n=300
${\hat{β}}_{1}$	[1.0004,2.0019]	[1.0023,1.9991]	[0.9988,2.0032]
${\hat{β}}_{2}$	[1.4999,2.4999]	[1.4998,2.5002]	[1.5001,2.4999]

Table 2. Sample mean square error MSE

$λ = 0.0727$	n=100	n=200	n=300
$M S E ({\hat{β}}_{1})$	0.01651	0.00723	0.00462
$M S E ({\hat{β}}_{2})$	0.00060	0.00007	0.00002

Table 3. Data and indicators

Region	Minimum temperature	Maximum temperature	Latitude
Hefei	24	29	31.79
Beijing	22	33	40.22
Chongqing	25	34	29.4
Fuzhou	27	38	26.05
Lanzhou	20	36	36.1
Guangzhou	27	34	23.16
Nanning	25	33	22.78
Guiyang	21	29	26.68
Haikou	26	33	20.02
Shijiazhuang	24	37	38.04
Haerbin	20	25	45.55
Zhengzhou	26	37	34.72
Wuhan	27	33	30.58
Changsha	25	33	28.26
Nanjing	26	29	31.33
Nanchang	28	35	28.55
Changchun	20	27	43.83
Shenyang	20	27	41.81
Huhehaote	19	31	40.81
Yinchuan	20	35	38.47
Xining	14	29	36.65
Xian	25	36	34.23
Jinan	25	33	36.55
Shanghai	26	32	31.41
Taiyuan	19	32	37.94
Chengdu	23	29	30.66
Tianjin	24	34	39.72
Wulumuqi	25	33	43.36
Lasa	12	23	29.65
Kunming	18	27	24.89
Hangzhou	27	35	30.21

Table 4. Global Moran’s I test results

Statistic	Minimum temperature	Maximum temperature
Moran’s I	0.4193	0.3013
p-value	0.000014	0.000985

Table 5. Interval valued parameter estimation results

$\hat{β_{1}}$	$\hat{β_{2}}$
[24.7501,29.5478]	[-0.1681,-0.0619]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Statistical Inference for Interval-Valued Spatial Error Models

Abstract

Keywords:

Subject:

1. Introduction

2. Preliminaries on interval-valued random variables

2.1 $d_{p}$ distance and $D_{p}$ distance

2.2 Moment of set-valued random variables

3. Interval-valued spatial error model

4. Numerical simulation

5. Empirical analysis

5.1 Data preparation

5.2 Parameter estimation of interval-valued spatial error model

Acknowledgments

References

MDPI Initiatives

Important Links

Subscribe

Statistical Inference for Interval-Valued Spatial Error Models

Abstract

Keywords:

Subject:

1. Introduction

2. Preliminaries on interval-valued random variables

2.1 d p distance and D p distance

2.2 Moment of set-valued random variables

3. Interval-valued spatial error model

4. Numerical simulation

5. Empirical analysis

5.1 Data preparation

5.2 Parameter estimation of interval-valued spatial error model

Acknowledgments

References

MDPI Initiatives

Important Links

Subscribe

2.1 $d_{p}$ distance and $D_{p}$ distance