Ridge-Type Pretest and Shrinkage Estimation in Spatial Error Model: An Application to Housing Cost Data

Preprint

Article

Ridge-Type Pretest and Shrinkage Estimation in Spatial Error Model: An Application to Housing Cost Data

Altmetrics

Downloads

119

Views

Comments

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

07 December 2023

Posted:

11 December 2023

You are already at the latest version

Alerts

Abstract

Spatial regression models have garnered significant attention across several disciplines, including functional magnetic resonance imaging analysis, econometrics, home price analysis, and many other domains. The phenomenon of sparsity is often found in nature, when a limited number of factors contribute significantly to the overall variation. Spatial regression models frequently use sparsity to indicate less complex computational and more superficial covariance structures. The spatial error model is a significant spatial regression model that focuses on the geographical dependence present in the error terms rather than the response variable. This study proposes an effective approach for estimating the vector of regression coefficients in the spatial error model, taking into consideration of the prior knowledge that some coefficients are insignificant and there is multicollinearity among the regressors. It also introduces pretest and shrinkage ridge estimators for spatial error regression models, evaluating their performance compared to traditional maximum likelihood estimators. It also assesses their efficacy using real-world data and bootstrapping techniques for comparison purposes.

Keywords:

Subject: Computer Science and Mathematics - Probability and Statistics

1. Introduction

Data collected over a geographic region may generally show some dependence, whereby nearby observations are more similar than those made at significant distances. The incorporation of a covariance structure into conventional statistical models allows for the modeling of this phenomenon. Spatial regression models, incorporating various spatial dependencies, are increasingly utilized in various disciplines like geology, epidemiology, disease surveillance, urban planning, and econometrics.

Autoregressive models in the context of time-series reflect the data at time t as a linear combination of the most recent observations. Similarly, in the spatial framework, these models represent the data from a certain spatial place as a function of data from neighboring locations. The collection of data is often associated with a geographical location known as a site, and a metric of distance is used to define the notion of proximity between these sites.

One of the most used autoregressive models is the Spatial error(SE) model in which a linear regression with a spatially lagged autoregressive error component is used to model the spatial response variable’s mean. [20] investigated the quantile regression estimation for the SE model with potentially variable coefficients. They established the proposed estimators’ asymptotic properties. [27] applied the SE model to examine the existence of spatial clustering and correlation between neighboring counties for the data from Egypt’s 2006 census. [38] used the SE model in order to evaluate the Social Disorganization Theory. [33] used the combined application of SE model and Spatial Lag Model based on cross-section data of 20 districts in Chengdu. They found that the haze had a negative impact on both the selling and rental prices of houses. [43] proposed a robust estimation method based on SE models, demonstrating reduced bias, a more stable empirical influence function, and robustness to outliers through simulation. More information about the spatial autoregressive models can be found in [15,18,19,25,42], among others.

In frequentist statistics, we use sample information to establish inferences about unknown parameters, while we columbine the sample information with some uncertain prior information (UPI) which is also known as non-sample information to draw conclusions about the unknown parameters in Bayesian statistics. It is possible that such subjective UPI is not always available. However, procedures for model selection such as Akaike’s Information Criterion (AIC), Bayesian Information Criterion (BIC), or model selection techniques could still be used to provide the UPIs.

One of the Initial trails to incorporate the sample information and the UPI used to estimate the regression parameters by using sample information and the UPI is referred to as pretest estimation. Pretest estimator relies on determining whether some of the regression coefficients are significant, after which the pretest selects either the full model estimator or the revised model estimator, which has fewer number of coefficients. Obviously, the pertest chooses the full or sub model estimators based on a binary weights. A new modification of the pretest estimator that uses a smooth wights between the full and sub model estimators is known as shrinkage estimator, it shrinks the estimator of the regression coefficients in the direction of a target value impacted by the UPI. Nevertheless, the modified shrinkage estimator suffers sometimes from an over shrinkage phenomenon. Later on an improved version of this estimator is proposed that controls the over shrinkage issue, which is known as positive shrinkage estimator.

The concept of using pretest and shrinkage estimating methodologies has received considerable attention by many researchers, for example, [6] introduced an efficient estimation using the pretest and shrinkage methods to estimate the regression coefficients vector of the marginal model in case of multinomial responses. [32] developed different shrinkage and penalty estimation strategies for the negative binomial regression model when an over fitting and uncertainty information exist about the subspace. [35] proposed shrinkage estimation for the parameter vector of the linear regression model with heteroscedastic errors, and extended their study to the high-dimensional heteroscedastic regression model. More details about the pretest and shrinkage can be found in [1,4,21], among other references.

The mulicolinearity is a major issue when fitting a multiple linear regression model using the ordinary least squares method (OLS) that appears when some of the regressor variables are correlated, especially when the correlation between any two is high. There are several techniques discussed in the literature to reduce the risk of this issue. [28] introduced the concept of ridge regression as a solution for nonorthogonal problems.They showed the estimator improve the mean square error of estimation. [30] introduced a new biased estimator and demonstrated theoretically and numerically the improvement of the new one. [31] proposed a new version of Liu estimator for the vector of parameters in a linear regression model based on some prior information.

Using the idea of shrinkage, [7] introduced an improved form of Liu-type estimator. Analytical and numerical results were used to demonstrate the proposed method’s superiority. [13] suggested the pretest and shrinkage ridge estimation for the linear regression model and showed the benefit of using the recommended estimators in conjunction with certain penalty estimators. [9] defined Liu-type rank-based estimators for robust regression, analyzed their asymptotic behavior, established biasing parameter superiority criteria, and supported their findings with numerical computations. [14] introduced the pretest and shrinkage approaches based on generalized ridge regression estimation. Later on, [8] suggested the use of the ridge estimator as a suitable approach for handling high-dimensional multicollinear data. Further, [11] proposed an enhanced ridge method for genome regression modeling and used a rank ridge estimator to estimate parameters and make predictions in the presence of multicollinearity and outliers within the data set. Recently, [3] proposed a novel pretest and shrinkage estimate technique, known as the Liu-type approach, developed for the conditional autoregressive model. For more information, we advice the reader to consult the following references [22,23].

In this article, we aim to propose the ridge-type pretest and shrinkage estimation strategy for the

p \times 1

regression coefficients vector in the SE model when some prior information is available about the irrelevant coefficients. We will partition the vector

β

{(β_{1}, β_{2})}^{T}

, where

β_{1}

is a

p_{1} \times 1

vector that contains the coefficients of the main effect, and

β_{2}

is a

p_{2} \times 1

vector of irrelevant coefficients, with

p_{1} + p_{2} = p

. Mainly, we focus in estimating the vector

β_{1}

when the UPI indicates that

β_{2}

is ineffective, which can be achieved by testing a statistical hypothesis of the form

H_{0} : β_{2} = 0

. In some instances, the estimator of the full model may exhibit considerable variability and provide challenges in terms of interpretation. Conversely, the estimator of the sub model may yield a significantly biased and under-fitted estimate. To tackle this matter, we have taken into account the pretest, shrinkage, and positive shrinkage ridge estimators for the vector

β_{1}

In accordance with our goal the paper is organized as follows. Section 2 offers an overview of the SE model. The discussion of the maximum likelihood estimators for the parameters of the SE model are discussed in Section 3. In Section 4, we proposed the pretest and shrinkage ridge estimators. Asymptotic analysis of the proposed estimators and some theoretical results are presented in Section 5. The set of estimators are compared numerically using simulated and real data example in Section 6. Some concluding remarks are given in Section 7. An appendix containing some proofs is given at the end of this manuscript.

2. Spatial Error Model

Let

s = {s_{1}, s_{2}, \dots, s_{n}}

represents a set of

(n)

spatial sites (frequently known as locations, regions, etc.). The set

s

forms what is commonly referred to as a lattice, and the set of nearby sites for

s_{j}

, denoted by

K (s_{j})

is defined as:

K (s_{j}) = {s_{i} : s_{i} is a neighbor of s_{j}}

i = 1, 2, \dots, n

. A neighborhood structure can be determined using a predefined adjacency metric. In regular lattices, if two sites just share edges, they are rook-based neighbors; if they also share borders and/or corners, they are queen-based neighbors.

Let

Y_{n} (s) = {Y (s_{1}), Y (s_{2}), \dots, Y (s_{n})}

be a vector of observations collected at sites

{s_{1}, s_{2}, \dots, s_{n}}

, and

X (s) = (X (s_{1}), X (s_{2}), \dots, X (s_{n}))

be the

(n \times p)

matrix of covariates. Following Cressie and Wikle [19], the SE model models the response

Y

at the

j^{t h}

site

s_{j}

as:

\begin{matrix} Y (s_{j}) & = & X^{T} (s_{j}) β + ϵ (s_{j}), j = 1, 2, \dots, n, \end{matrix}

(1)

\begin{matrix} with ϵ (s_{j}) & = & \sum_{i \neq j}^{n} λ_{j i} ϵ (s_{i}) + e (s_{j}), j = 1, 2, \dots, n, \end{matrix}

(2)

where

β = {(β_{1}, β_{2}, \dots, β_{p})}^{T}

be the

(p \times 1)

vector of unknown regression parameters, known as the large-scale effect,

e (s) = {(e (s_{1}), e (s_{2}), \dots, e (s_{n}))}^{T}

is noise vector that has a Gaussian distribution with mean

0

and covariance matrix

Ω = d i a g {σ_{j}^{2}}_{j = 1}^{n}

. The parameters

λ_{j i}

are used to model the spatial dependencies among the errors

ϵ_{j}, j = 1, 2, \dots, n

, with

λ_{j j} = 0

. Let

Λ = {λ_{j, i}}_{j, i = 1}^{n}

, and assume that

(I - Λ)

is invertible, where

I

is the

(n \times n)

identity matrix, then by ignoring the spatial indices, the SE in (1) can be rewritten in matrix format as:

\begin{matrix} Y = X β + ϵ with ϵ \sim N (0, {(I - Λ)}^{- 1} Ω {(I - Λ^{T})}^{- 1}) \end{matrix}

(3)

Nature exhibits sparsity in many situations, which means that a small number of factors can account for the majority of the observed variability. Sparsity is frequently used in spatial regression models to imply covariance structures that are easier to compute. Consequently, by setting

Ω = σ^{2} I

, and

Λ = ρ W

, where

σ^{2}

is the variance component,

ρ

is the spatial dependance parameter, and

W

is the weight or proximity known matrix with a main diagonal of zeros, and off diagonal entices

w_{j i} = 1

if the location j is neighbor to location i, otherwise

w_{j i} = 0 for j \neq i

, the preceding model yields a straightforward and frequently used version. Usually, the weight matrix is normalized as

W^{*} = {\frac{w_{j i}}{w_{j +}}}_{j, i = 1}^{n}

. So, The SE regression model can be rewritten as follows

\begin{matrix} Y & = & X β + ϵ, where ϵ \sim N (0, σ^{2} V_{n}) \end{matrix}

(4)

\begin{matrix} and V_{n} & = & {(1 - ρ W^{*})}^{- 1} {(1 - ρ W^{* T})}^{- 1} \end{matrix}

(5)

3. Maximum Likelihood Estimation

Let

θ = (β, σ^{2}, ρ)

, the maximum likelihood estimator (MLE) of

θ

may be acquired by the use of a two-step profile-likelihood method see [18]. At first we fix

ρ

and find the MLEs of

β, σ^{2}

as a function of

ρ

, which are given below

\begin{matrix} \hat{β} (ρ) & = & {(X^{T} V_{n}^{- 1} X)}^{- 1} X^{T} V_{n}^{- 1} Y \end{matrix}

(6)

\begin{matrix} {\hat{σ}}^{2} (ρ) & = & \frac{{(Y - X \hat{β} (ρ))}^{T} V_{n}^{- 1} (Y - X \hat{β} (ρ))}{n} . \end{matrix}

(7)

Then, we plug

\hat{β}

and

{\hat{σ}}^{2}

into the log-likelihood and obtain the MLE of

ρ

by maximizing the profile log-likelihood function. Finally, the MLEs of

β

and

σ^{2}

are computed by replacing

ρ

\hat{ρ}

in equations (6) and (7), respectively. [34] proved that

\hat{θ}

is a consistent estimator of

θ

, and asymptotically has normal distribution. This finding makes it simple to demonstrate that

\hat{β}

is asymptotically normal and consistent. The significance of regression coefficients can often be determined subjectively or through certain model selection techniques in various situations. As a result of this information, the

(p \times 1)

regression coefficients vector

β

is divided into two sub vectors as

β = (β_{1}, β_{2})

, where

β_{1}

is a

p_{1} \times 1

vector of important coefficients and

β_{2}

is a

p_{2} \times 1

vector of unimportant coefficients with

p_{1} + p_{2} = p

. Similarly, the matrix of covariates

X

is also partitioned as

X = (X_{1} | X_{2})

, where

X_{1}

and

X_{2}

are consisting of the first

p_{1}

and the last

p_{2}

columns of the design matrix

X

of dimensions

n \times p_{1}

and

n \times p_{2}

, respectively. Consequently, the SE full model in (3) can be rewritten as:

\begin{matrix} Y & = & X_{1} β_{1} + X_{2} β_{2} + ϵ \end{matrix}

(8)

For the full model in (8), we can obtain the MLEs of

(β_{1}, β_{2})

using the same technique employed in model (4), see [5]. The MLEs are as follows

\begin{matrix} {\hat{β}}_{1} & = & {(X_{1}^{T} A_{X_{2}} X_{1})}^{- 1} X_{1}^{T} A_{X_{2}} Y, where \end{matrix}

(9)

\begin{matrix} A_{X_{2}} & = & {\hat{V}}_{n}^{- 1} - {\hat{V}}_{n}^{- 1} X_{2} {(X_{2}^{T} {\hat{V}}_{n}^{- 1} X_{2})}^{- 1} X_{2}^{T} {\hat{V}}_{n}^{- 1} \end{matrix}

and

{\hat{β}}_{2}

has an identical formula as

\hat{β_{1}}

by exchanging the indices 1 and 2 in the above two equations. The full model estimation may be prone to significant variability and may be difficult to interpret. Our primary goal is on estimating

β_{1}

when

X_{2}

does not sufficiently account for the variation in the response variable, which can be achieved by formulating a linear hypothesis as follows:

\begin{matrix} H_{0} : β_{2} & = & 0 \end{matrix}

(10)

Assuming the null hypothesis in (10) is true, the updated model based on this assumption of the model given (8) becomes

\begin{matrix} Y & = & X_{1} β_{1} + ϵ \end{matrix}

(11)

We will refer to the model in (11) as the restricted SE model. Let

{\hat{β}}_{1}^{S}

be the MLE of

β_{1}

of the model in(11), then

\begin{matrix} {\hat{β}}_{1}^{S} & = & {(X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1})}^{- 1} X_{1}^{T} {\hat{V}}_{n}^{- 1} Y \end{matrix}

(12)

Obviously,

{\hat{β}}_{1}^{S}

will have a better performance than

\hat{β_{1}}

if the null hypothesis in (10) is true, while the opposite occurs when

β_{2}

begins to move away from the null space. Yet, the restricted strategy method can provide under-fitted and highly biased model. To dominate the large bias, we propose the ridge-type estimation strategy of the full, and reduced models, then improve the two estimators using the pretest and shrinkage estimation idea.

4. Materials and Methods: Developing Pretest and Shrinkage Ridge Estimation Strategies

In this section we propose a set of estimators for the SE model parameters vector

β_{1}

in (11). Following [28], The ridge estimator of

β

for the model given in (4) is defined as

\begin{matrix} {\hat{β}}^{R F} & = & {(X^{T} {\hat{V}}_{n}^{- 1} X + k I_{p})}^{- 1} X^{T} {\hat{V}}_{n}^{- 1} Y, \end{matrix}

(13)

where

k > 0

is known as the ridge parameter. Clearly, when

k = 0

, the ridge estimator reduces to the MLE of

β

, but if

k ⟶ \infty

, the ridge estimator

{\hat{β}}^{R F} = 0

4.1. Full and Reduced Models Ridge Estimators

The unretracted full model ridge estimator of

β_{1}

, denoted by

{\hat{β}}_{1}^{UR}

is defined as follows

\begin{matrix} {\hat{β}}_{1}^{UR} & = & {(X_{1}^{T} A_{X_{2}} X_{1} + k_{f} I_{p_{1}})}^{- 1} X_{1}^{T} A_{X_{2}} Y, \end{matrix}

(14)

where

k_{f}

is the ridge parameter for unretracted full model estimator

{\hat{β}}_{1}^{UR}

. Assuming the null hypothesis in (10) is true, the restricted ridge estimator of

β_{1}

for the model in (11), denoted by

{\hat{β}}_{1}^{RR}

is given by

\begin{matrix} {\hat{β}}_{1}^{RR} & = & {(X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1} + k_{r} I_{p_{1}})}^{- 1} X_{1}^{T} {\hat{V}}_{n}^{- 1} Y, \end{matrix}

(15)

where

k_{r}

is the ridge parameter for restricted model estimator

{\hat{β}}_{1}^{RR}

. When the null hypothesis in (10) is accurate or almost accurate (i.e., when

β_{2}

is close to zero),

{\hat{β}}_{1}^{RR}

is generally a more effective estimator than

{\hat{β}}_{1}^{UR}

. Nevertheless, as

β_{2}

deviates from the zero space,

{\hat{β}}_{1}^{RR}

becomes inefficient in comparison with the unrestricted estimator

{\hat{β}}_{1}^{UR}

. In addition to the gain obtained by employing the idea of ridge estimation to the MLE of

β 1

, we also aim to find estimators that are functions of

{\hat{β}}_{1}^{UR}

and

{\hat{β}}_{1}^{RR}

and intended to lessen the dangers connected with any of these two estimators over the majority of the parameter space. The pretest and shrinkage estimators, which will be built in the following subsection, can help with this.

4.2. Pretest, Shrinkage, and Positive Shrinkage Ridge Estimators

In line with testing the null hypothesis in (10), the pretest estimator selects either the full model estimator

{\hat{β}}_{1}^{UR}

H_{0}

is rejected or the restricted ridge estimator

{\hat{β}}_{1}^{RR}

if not. An appropriate test statistics to test the hypothesis in (10) is:

\begin{matrix} T_{n} & = & \frac{{({\hat{β}}_{2}^{UR})}^{T} (X_{2}^{T} A_{X_{1}} X_{2}) ({\hat{β}}_{2}^{UR})}{s^{2}}, \end{matrix}

where

A_{X_{1}}

is defined in a similar manner as

A_{X_{2}}

{\hat{β}}_{2}^{UR} = {(X_{2}^{T} A_{X_{1}} X_{2})}^{- 1} X_{2}^{T} A_{X_{1}} Y

, and

s^{2} = {(Y - X {\hat{β}}^{R F})}^{T} (Y - X {\hat{β}}^{R F}) / (n - p)

, which is a consistent estimator of

σ^{2}

, and the statistic

T_{n}

follows asymptotically a chi-square distribution with

p_{2}

degrees of freedom under the null hypothesis. Hence, the pretest estimator, denoted by

{\hat{β}}_{1}^{PTR}

, is given by

\begin{matrix} {\hat{β}}_{1}^{PTR} & = & {\hat{β}}_{1}^{UR} - ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) I (T_{n} \leq χ_{α, p_{2}}^{2}), \end{matrix}

(16)

where

I (.)

is an indicator function, and

χ_{α, p_{2}}^{2}

is the upper

α^{t h}

quantile of the chi-square distribution with

p_{2}

degrees of freedom. The pretest estimator depends on the level of the significance

(α)

, and selects

{\hat{β}}_{1}^{UR}

if the null hypothesis is rejected, and

{\hat{β}}_{1}^{RR}

otherwise based on a binary weights. These drawbacks can be improved using smoother weights of the two estimators

{\hat{β}}_{1}^{UR}

and

{\hat{β}}_{1}^{RR}

instead, which is known as the shrinkage estimator. It is denoted by,

{\hat{β}}_{1}^{SR}

, and given by

\begin{matrix} {\hat{β}}_{1}^{SR} & = & {\hat{β}}_{1}^{RR} + ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) {1 - (p_{2} - 2) T_{n}^{- 1}}, p_{2} \geq 3 . \end{matrix}

(17)

The shrinkage estimator may experience an over-shrinkage in which negative coordinates may be produced whenever

(T_{n} < p_{2} - 2)

. The positive shrinkage estimator, a modified version of

{\hat{β}}_{1}^{SR}

, resolves this issue. It is denoted by

{\hat{β}}_{1}^{PSR}

, and given by

\begin{matrix} {\hat{β}}_{1}^{PSR} & = & {\hat{β}}_{1}^{RR} + ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) {1 - (p_{2} - 2) T_{n}^{- 1}}^{+}, \end{matrix}

(18)

where

x^{+} = m a x (x, 0) .

It is easy to see that all the pronounced shrinkage estimators satisfy the following general form

\begin{matrix} {\hat{β}}_{1}^{Shrinkage} = {\hat{β}}_{1}^{UR} - ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n}) . \end{matrix}

(19)

Simply, for

{\hat{β}}_{1}^{PTR}

{\hat{β}}_{1}^{SR}

, and

{\hat{β}}_{1}^{PSR}

, the corresponding

g (\cdot)

functions are given by

I (T_{n} \leq χ_{α, p_{2}}^{2})

(p_{2} - 2) T_{n}^{- 1}

, and

(1 - (p_{2} - 2) T_{n}^{- 1}) I (T_{n} \leq χ_{α, p_{2}}^{2})

respectively.

5. Asymptotic Analysis

In this section, we will study the asymptotic performance of all estimators based on their asymptotic quadratic risks. Our goal is to investigate the behaviour of the set of estimators near the null space, so we consider a sequence of local alternatives given by

\begin{matrix} H_{(n)} : β_{2 (n)} & = & \frac{ξ}{\sqrt{n}}, ξ \in ℜ^{p_{2}}, with ξ \neq 0 \end{matrix}

(20)

Obviously, when

ξ = 0

, the local alternatives in (20) may be simplified to the null hypothesis given in (10). Assuming that

K (x)

represents the cumulative distribution function of any estimator of

β_{1}

, say

{\hat{β}}_{1}^{*}

, then:

K (x) = lim_{n ⟶ \infty} P_{H_{(n)}} (\sqrt{n} ({\hat{β}}_{1}^{*} - β_{1}))

. Thus for any

(p_{1} \times p_{1})

positive definite matrix

M

, the weighted quadratic loss function is defined as

\begin{matrix} W ({\hat{β}}_{1}^{*}, β_{1}) & = & n {({\hat{β}}_{1}^{*} - β_{1})}^{T} M ({\hat{β}}_{1}^{*} - β_{1}) \\ = & t r [M [n ({\hat{β}}_{1}^{*} - β_{1}) {({\hat{β}}_{1}^{*} - β_{1})}^{T}]], \end{matrix}

where

t r (A)

is the trace of the matrix

A

. Define

ϑ_{n}^{*} = \sqrt{n} ({\hat{β}}_{1}^{*} - β_{1})

, then if

ϑ_{n}^{*} \overset{D}{⟶} ϑ^{*}

, where

\overset{D}{⟶}

denotes to the convergence in distribution, then the asymptotic (distributional) quadratic risk (ADQR) of

{\hat{β}}_{1}^{*}

, denoted by

Γ (β_{1}^{*})

, is given by

\begin{matrix} Γ (β_{1}^{*}, M) & = & E (ϑ_{n}^{* T} M ϑ_{n}^{*}) = \int (x_{1}^{T} M x_{1}) d K (x_{1}) \end{matrix}

(21)

The asymptotic (distributional) bias (ADB) of

{\hat{β}}_{1}^{*}

can be obtained via

\begin{matrix} ADB ({\hat{β}}_{1}^{*}) = E (lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{*} - β_{1})) . \end{matrix}

(22)

For the purpose of deriving asymptotic distributional properties, in addition to the first four assumptions of [34], we set the following regularity conditions:

(A1): ${max}_{1 \leq i \leq n} \frac{1}{n} x_{i}^{T} {(X^{T} {\hat{V}}_{n}^{- 1} X)}^{- 1} x_{i} \to 0$ , as $n \to \infty$ , where $x_{i}$ is the ith row of $X$ .
(A2): Let $C_{n} = X^{T} {\hat{V}}_{n}^{- 1} X$ . Then, $lim_{n \to \infty} \frac{1}{n} C_{n} = C$ , where $C$ is $(p \times p)$ positive definite matrix.
(A3): Let

$\begin{matrix} C_{n}^{- 1} & = & {(\begin{matrix} X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1} & X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{2} \\ X_{2}^{T} {\hat{V}}_{n}^{- 1} X_{1} & X_{2}^{T} {\hat{V}}_{n}^{- 1} X_{2} \end{matrix})}^{- 1} a n d \\ C^{- 1} & = & {(\begin{matrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{matrix})}^{- 1} = (\begin{matrix} C_{11.2}^{- 1} & - C_{11}^{- 1} C_{12} C_{22.1}^{- 1} \\ - C_{22}^{- 1} C_{21} C_{11.2}^{- 1} & C_{22.1}^{- 1} \end{matrix}), \end{matrix}$

Then, $lim_{n \to \infty} {(\frac{1}{n} C_{n})}^{- 1} = C^{- 1}$ , where $C_{i i . j} = C_{i i} - C_{i j} C_{j j}^{- 1} C_{j i}$ for $i, j = 1, 2$ .

In sequel, we call the above assumptions as the “named regularity condition (NRC)".

The primary tool to derive expressions of the asymptotic quadratic risks for the proposed estimators is to find the asymptotic distribution of the unrestricted full model ridge estimator

{\hat{β}}_{1}^{UR}

and the restricted ridge estimator

{\hat{β}}_{1}^{RR}

. To this end, we make use of the following lemma. The proof is provided in the Appendix.

Lemma 1.

Assume the NRC. If

k / \sqrt{n} \to k_{o} \geq 0

, then

\begin{matrix} \sqrt{n} ({\hat{β}}^{RF} - β) \overset{D}{\to} N_{p} (- k_{o} C^{- 1} β, σ^{2} C^{- 1}), \end{matrix}

where

\overset{D}{\to}

denotes convergence in distribution. Indeed, Lemma 1 enables us to give some asymptotic distributional results about the estimators

{\hat{β}}_{1}^{UR}

and

{\hat{β}}_{1}^{RR}

, presented in the following theorem, which are easy to prove. See [21] for similar results.

Theorem 1.

Let

ϑ_{n}^{(1)} = \sqrt{n} ({\hat{β}}_{1}^{UR} - β_{1})

ϑ_{n}^{(2)} = \sqrt{n} ({\hat{β}}_{1}^{RR} - β_{1})

, and

ϑ_{n}^{(3)} = \sqrt{n} ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR})

. Assume the local alternatives in (20) and NRC. Then, as

n \to \infty

we have

(1): $ϑ_{n}^{(1)} \sim N_{p_{1}} (- η_{11.2}, σ^{2} C_{11.2}^{- 1})$
(2): $ϑ_{n}^{(2)} \sim N_{p_{1}} (δ - η_{11.2}, σ^{2} C_{11}^{- 1})$
(3): $ϑ_{n}^{(3)} \sim N_{p_{1}} (δ, σ^{2} (C_{11.2}^{- 1} - C_{11}^{- 1}))$
(4): $(\begin{matrix} ϑ_{n}^{(1)} \\ ϑ_{n}^{(3)} \end{matrix}) \sim N_{2 p_{1}} ((\begin{matrix} - η_{11.2} \\ δ \end{matrix}), σ^{2} (\begin{matrix} C_{11.2}^{- 1} & C_{11.2}^{- 1} - C_{11}^{- 1} \\ C_{11.2}^{- 1} - C_{11}^{- 1} & C_{11.2}^{- 1} - C_{11}^{- 1} \end{matrix}))$
(5): $(\begin{matrix} ϑ_{n}^{(2)} \\ ϑ_{n}^{(3)} \end{matrix}) \sim N_{2 p_{1}} ((\begin{matrix} δ - η_{11.2} \\ δ \end{matrix}), σ^{2} (\begin{matrix} C_{11}^{- 1} & 0 \\ 0 & C_{11.2}^{- 1} - C_{11}^{- 1} \end{matrix}))$
(6): E $[ϑ_{n}^{(1)} | ϑ_{n}^{(3)}] = - η_{11.2} + ϑ_{n}^{(3)} - δ$
(7): $P r (T_{n} \leq x) = H_{q} (x; Δ)$ , where $H_{q} (x; Δ)$ is the cumulative distribution function of a non-central chi-square distribution with q degrees of freedom and non-centrality parameter Δ.

where

η = {(η_{1}^{T}, η_{2}^{T})}^{T} = - k_{o} C^{- 1} β

η_{11.2} = η_{1} - C_{12} C_{22}^{- 1} ((β_{2} - ξ) - η_{2})

δ = C_{11}^{- 1} C_{12} ξ

With Lemma 1 in hand, it is pretty straightforward to reach the asymptotic distributional properties of the shrinkage estimators. Through the subsequent theorems, we will give the asymptotic bias and weighted quadratic risk functions.

Theorem 2.

Under the assumptions of Lemma 1, the asymptotic distributional bias of the shrinkage estimators are given by

1 . ADB ({\hat{β}}_{1}^{PTR}) = - η_{11.2} - δ H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ)

2 . ADB ({\hat{β}}_{1}^{SR}) = - η_{11.2} - (p_{2} - 2) δ E (χ_{p_{2} + 2}^{- 2} (Δ))

3 . ADB ({\hat{β}}_{1}^{PSR}) = - η_{11.2} - δ H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) + (p_{2} - 2) δ E [χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)], where

E (χ_{q}^{- 2 i} (Δ)) = \int_{x = 0}^{x = \infty} x^{- 2 i} d H_{q} (x; Δ), i = 1, 2 .

For the proof, refer to Appendix.

The following result reveals the expressions for the ADQR of the proposed shrinkage estimators.

Theorem 3.

Under the assumptions of Lemma 1, the asymptotic distributional quadratic risk of the shrinkage estimators are given by

\begin{matrix} 1 . Γ ({\hat{β}}_{1}^{PTR}, M) & = & Γ ({\hat{β}}_{1}^{UR}, M) - 2 η_{11.2}^{T} M δ H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) \\ - σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) \\ δ^{T} M δ [2 H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) - H_{p_{2} + 4} (χ_{α, p_{2}}^{2}; Δ)] \end{matrix}

\begin{matrix} 2 . Γ ({\hat{β}}_{1}^{SR}, M) & = & Γ ({\hat{β}}_{1}^{UR}, M) + 2 (p_{2} - 2) η_{11.2}^{T} M δ E (χ_{p_{2} + 2}^{- 2} (Δ)) \\ - (p_{2} - 2) σ^{2} t r (M C_{11}^{- 1} C_{12} C_{22.1}^{- 1} C_{21} C_{11}^{- 1}) \\ \{2 E (χ_{p_{2} + 2}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 2}^{- 4} (Δ))\} \\ + & (p_{2} - 2) δ^{T} M δ \\ \times \{2 E (χ_{p_{2} + 2}^{- 2} (Δ)) - 2 E (χ_{p_{2} + 4}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 4}^{- 4} (Δ))\} . \end{matrix}

\begin{matrix} 3 . Γ ({\hat{β}}_{1}^{PSR}, M) & = & Γ ({\hat{β}}_{1}^{SR}, M) - 2 η_{11.2}^{T} M δ A_{1} \\ + (p_{2} - 2) σ^{2} t r (M C_{11}^{- 1} C_{12} C_{22.1}^{- 1} C_{21} C_{11}^{- 1}) A_{2} \\ - σ^{2} t r (M C_{11}^{- 1} C_{12} C_{22.1}^{- 1} C_{21} C_{11}^{- 1}) H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) \\ + δ^{T} M δ [2 H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) - H_{p_{2} + 4} (χ_{α, p_{2}}^{2}; Δ)] \\ - (p_{2} - 2) δ^{T} M δ A_{3}, \end{matrix}

where

M

is a positive definite weight matrix,

\begin{matrix} Γ ({\hat{β}}_{1}^{UR}, M) = η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}), and \end{matrix}

\begin{matrix} A_{1} & = & E \{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2), \\ A_{2} & = & 2 E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\} \\ - (p_{2} - 2) E \{χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\}, \\ A_{3} & = & 2 E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\} \\ - 2 E \{χ_{p_{2} + 4}^{- 2} (Δ) I (χ_{p_{2} + 4}^{2} (Δ) \leq p_{2} - 2)\} \\ + (p_{2} - 2) E \{χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\} . \end{matrix}

For the proof, refer to the Appendix.

6. Numerical Analysis

To demonstrate our theoretical findings, we first use Monte Carlo simulation experiments, then apply the set of proposed estimators to a real data set. The Monte Carlo simulation is used to investigate the performance of the ridge-type set of estimators in comparison to the MLE

(\hat{β_{1}})

given in (9) via the simulated mean square error of each estimator.

6.1. Simulation Experiments

In this section, we compare the set of ridge-type estimators with respect the MLE using Monte Carlo simulation experiments based on their simulated mean squared errors. In each one of theses experiments, we consider an

(N \times N)

square lattices using

N = 7, 10

with the corresponding sample sizes of

n = N^{2} = 49, 100

, respectively. To show the performance of the proposed estimators when a multicollinearity exits, we generate the design matrix

X

from multivariate normal distribution with mean

0

, and a variance-covariance matrix with first order autoregressive structure in which

c o v (X_{i}, X_{j}) = {\begin{matrix} ρ_{x}^{| i - j |} & i \neq j \\ 1 & i = j \end{matrix}

and apply it for

ρ_{x} \in {0.3, 0.6, 0.9}

, while the error term

ϵ

is generated from another multivariate normal with mean

0

and a SE variance matrix with

V_{n} = σ^{2} {(I - ρ W^{*})}^{- 1} {(I - ρ W^{*'})}^{- 1}

. We set

σ^{2} = 1

. For the weight matrix

W^{*}

, a queen-based contiguity neighborhood was used. The set of values for

ρ

{0.3, 0.6, 0.9}

. We partitioned the vector of coefficients

β

β = (β_{1}, β_{2})

where

β_{1} = 1_{p 1}

is a

p_{1} \times 1

vector of ones, and

β_{2} = (Δ, 0_{p_{2} - 1})

0_{p_{2} - 1}

is a zero vector of dimension

(p_{2} - 1 \times 1)

, and

Δ = ∥ β - β_{0} ∥

, where

∥ A ∥

is the Euclidian norm of

A

, and

Δ

represents the non-centrality parameter. The range of values for

Δ

is set to be from 0 to 2. Then we fitted the model in (8) using the spautolm function within the R-package spdep [16], obtain the values of all estimators considered in our study, and computed the simulated mean square error (SMSE) of each estimator as

S M S E (\hat{β_{1}^{*}}) = \sum_{i = 1}^{p_{1}} {(\hat{β_{1 i}^{*}} - β_{1 i})}^{2}

. The simulated relative efficiency (SRE) of any estimator, say

{\hat{β_{1}}}^{\circ}

, with respect to the MLE

(\hat{β_{1}})

is calculated as:

\begin{matrix} S R E ({\hat{β_{1}}}^{\circ}) & = & \frac{S M S E (\hat{β_{1}})}{S M S E ({\hat{β_{1}}}^{\circ})}, \end{matrix}

(23)

where

{\hat{β_{1}}}^{\circ}

is any of the estimators

{{\hat{β}}_{1}^{UR}, {\hat{β}}_{1}^{RR}, {\hat{β}}_{1}^{PTR}, {\hat{β}}_{1}^{SR}, {\hat{β}}_{1}^{PSR}}

. It is evident that when the

S R E ({\hat{β_{1}}}^{\circ})

is greater than one, it signifies that this estimate outperforms the MLE of the full model, and vice versa. We run the simulation for

(p_{1}, p_{2}) \in {(5, 10), (5, 20), (5, 30)}

, and use

α = 0.05

for testing the hypothesis in (20). No statistically significant change was seen while altering the spatial dependency parameter. Therefore, we have chosen to simply exhibit the graphs for

ρ = 0.90

. Figure 1, Figure 2 and Figure 3 show the results of the SRE against various values of

Δ

. The findings support the following conclusions:

(i): Across all values, the ridge-type full model estimator $({\hat{β}}_{1}^{UR})$ consistently outperforms the traditional MLE estimator. Furthermore, as $p_{2}$ increases, so does its efficiency for fixed values of $ρ$ and $ρ_{x}$ . Additionally, when the multicollinearity among the explanatory variables in the design matrix becomes stronger, ${\hat{β}}_{1}^{UR}$ efficiency increases as expected.
(ii): The Ridge-type sub-model estimator $({\hat{β}}_{1}^{RR})$ outperforms all other estimators when $Δ = 0$ . Since the null hypothesis is correct, it is expected. However, once $Δ$ begins to depart from the null space, the estimator’s SRE drops precipitously and approaches to zero, making it less effective than the other estimators.
(iii): The SRE values grow while holding other parameters constant as the correlation coefficient $ρ_{x}$ increases among the explanatory factors.
(iv): As the number of zero coefficients $(p_{2})$ increase , all estimators SRE also increase.
(v): The ridge-type positive shrinkage estimator $({\hat{β}}_{1}^{PSR})$ uniformly prevails over the competing estimators.

6.2. Data Example

In 1970, [26] examined the use of housing market data for census tracts in the Boston Statistical Metropolitan Area. Their major objective was to establish a relationship between a set of (15) variables and the median cost of owner-occupied residences in Boston. [24] offered a corrected version of the data set along with new spatial data. The data set is accessible through the R-Package spdep. There are 506 observations in the data, each of which relates to a single census tract. The variables in the data include the tract identification number (TRACT), median owner-occupied housing prices in US dollars (MEDV), corrected median owner-occupied housing prices in US dollars (CMEDV), percentages of residential land zoned for lots larger than 2500 square feet per town (constant for all Boston tracts) (ZN), percentages of non-retail business areas per town (INDUS), average room sizes per home (RM), the percentage of owner-occupied homes built before 1940 (AGE), a dummy variable with two levels that is 1 if the tract borders the Charles River and 0 otherwise (CHAS), crime rate per capita (CRIM), weighted distance to main employment centers (DIS), nitrogen oxides concentration (parts per 10 million) per twon (NOX), an accessibility index to radial highway per town (constant for all Boston tracts) (RAD), property tax rate per town ($10,000)(constant for all Boston tracts) (TAX), percentage of the lower-class population (LSTAT), pupil-teacher ratios per town (constant for all Boston tracts) (PTRATIO), and the variable

1000 {(b - 0.63)}^{2}

, where b is the proportion of blacks (B). [37] added the location of each tract in latitude (LAT), and longitude (LON) variables.

Assuming a SE model, we can predict the response variable log(CMEDV) using all available variables, that will be referred to as full SE model. For these data, a variety of selection techniques were used to determine the submodel. One submodel that was used by [5] is the model obtained by adaptive LASSO algorithm, which will be referred as our SE submodel. The two models are summarized in Table 1.

Figure 4 displays a coloured plot of the correlation coefficients for each variable. When a strong linear relationship is present, the color seems dark; when a weak linear relationship is present, the color shifts to light or may even vanish. The CMEDV and a few other factors have a strong linear relationship, as seen in the plot. This plot is useful for examining the strength of linearity between the original response CMEDV and any other variable, if exists. The selected variables by adaptive LASSO algorithm appear to have a strong, medium and weak relationship with the response variable. Moreover, some variables exhibit collinearity, and this issue will show how really the ridge-type estimators will show up the high performance in comparison with the MLE estimator.

To assess the effectiveness of the suggested estimators we employed a bootstrapping technique suggested by [41], computing the mean squared prediction error (MSPE) for any estimator as follows:

1.: Fit a SE full and sub models as appear in Table 1 using the spautolm function and get the MLEs of $β_{1}$ , $σ^{2}$ , the spatial dependance parameter $ρ$ , the covariance matrix $V_{n}$ .
2.: As the columns of the two matrices ${(X_{1}^{T} A_{X_{2}} X_{1})}^{- 1}$ and ${(X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1})}^{- 1}$ are not orthogonal, and the sample size is large, we followed Philip S. et al [17] to estimate the tuning ridge parameters for the two estimators ${\hat{β}}_{1}^{UR}$ and ${\hat{β}}_{1}^{RR}$ which are, respectively, given by:

$k_{f} = \frac{{\hat{σ}}^{2} t r {(X 1^{T} A_{X_{2}} X 1)}^{- 1}}{{(\hat{β_{1}})}^{T} {(X 1^{T} A_{X_{2}} X 1)}^{- 1} \hat{β_{1}}}$ , and $k_{r} = \frac{{\hat{σ}}^{2} t r {(X 1^{T} {\hat{V}}_{n}^{- 1} X 1)}^{- 1}}{{({\hat{β}}_{1}^{S})}^{T} {(X 1^{T} {\hat{V}}_{n}^{- 1} X 1)}^{- 1} {\hat{β}}_{1}^{S}}$ .
3.: Use the Cholesky decomposition method in order to express the matrix $\hat{V_{n}}$ in a decomposed form as $\hat{V_{n}} = U U^{T}$ , where $U$ is an $(n \times n)$ lower triangular matrix.
4.: Let $\hat{ϵ} = U^{- 1} (Y - X \hat{β})$ , where $\hat{ϵ} = ({\hat{ϵ}}_{1}, {\hat{ϵ}}_{2}, \dots, {\hat{ϵ}}_{n})$ , and define the centered residual as $ϵ_{i}^{c} = {\hat{ϵ}}_{i} - \frac{1}{n} \sum_{j = 1}^{n} {\hat{ϵ}}_{j}$ , then select with replacement a sample of size $(n)$ form $(ϵ_{1}^{c}, ϵ_{2}^{c}, \dots, ϵ_{n}^{c})$ to get $ϵ^{★} = (ϵ_{1}^{★}, ϵ_{2}^{★}, \dots, ϵ_{n}^{★})$ .
5.: Calculate the bootstrapping response value as $Y^{★} = X \hat{β} + U^{- 1} ϵ^{★}$ , then use it to fit the full and sub models and obtain bootstrapping estimated values of all estimators.
6.: Calculate the predicted value of the response variable using each estimator as follows: $\hat{y_{k i}^{★}} = X_{1} \hat{β_{1}^{*}} + \hat{ρ^{★}} \sum_{j = 1}^{n} W_{i j}^{*} (\hat{y_{k j}^{★}} - X_{j} \hat{β_{1}^{*}})$ , where $\hat{β_{1}^{*}}$ represents any of the estimators in the set ${\hat{β_{1}}, {\hat{β}}_{1}^{UR}, {\hat{β}}_{1}^{RR}, {\hat{β}}_{1}^{PTR}, {\hat{β}}_{1}^{SR}, {\hat{β}}_{1}^{PSR}}$ .
7.: For the $k^{t h}$ bootstrapping sample, calculate the square root of the mean square prediction error (MSPE) as

$\begin{matrix} M S P E_{k} (\hat{β_{1}^{*}}) & = & \sqrt{\frac{\sum_{i = 1}^{n} {(\hat{y_{i}^{*}} - y_{i})}^{2}}{n}}, k = 1, 2, \dots, B, \end{matrix}$

(24)

where B is the number of bootstrapping samples.
8.: Calculate the relative efficiency (RE) of any estimator with respect to the MLE $\hat{β_{1}}$ as follows:

$\begin{matrix} R E (\hat{β_{1}^{•}}) & = & \frac{M S P E (\hat{β_{1}})}{M S P E (\hat{β_{1}^{•}})}, \end{matrix}$

(25)

where $\hat{β_{1}^{•}}$ is any of the ridge-types proposed estimators. We apply the bootstrapping technique $B = 2000$ times.

Table 2 summarizes the results of the relative efficiencies, where a number greater than one of the relative efficiency implies the superior performance of the estimator in the denominator.

The table illustrates the better performance of the submodel ridge-type estimator

({\hat{β}}_{1}^{RR})

compared to all other estimators. It is then followed by the pretest estimator

({\hat{β}}_{1}^{PTR})

, demonstrating the correctness of the submodel that was selected. Also, the ridge positive shrinkage estimator performs better than the shrinkage one. Furthermore, all ridge-type estimators outperformed the MLE of

β 1

7. Conclusion

This paper discusses the pretest, shrinkage, and positive shrinkage ridge-type estimators of the parameter vector

(β)

for SE model when there is a previous suspicion that certain coefficients are insignificant, and the muticolinearity presents between two or more regressor variables. To obtain the proposed set of estimators for the main effect vector of coefficients

(β_{1})

, we test the hypothesis

H_{0} : β_{2} = 0

. The proposed estimators were compared analytically via their asymptotic distributional quadratic risks, and numerically by simulation experiments and a real data example.

Our results, showed that there is no significant effect of the spatial dependence parameters

(ρ)

, while the performance of the ridge estimators increases when the correlation among the regressor variables increase. Moreover, the performance of the ridge estimators always better than the MLE. In addition, the estimator

({\hat{β}}_{1}^{RR})

dominates all estimators under the null hypothesis

H_{0} : β_{2} = 0

or when near the null space and delivers higher efficiency than the other estimators. However, the proposed positive shrinkage ridge estimators

({\hat{β}}_{1}^{PSR})

performs better than the MLE in all seniors. Further, we apply the set of estimators to a real data example, and used a bootstrapping technique to evaluate their performance based on the relative efficacy of square root of the mean squared prediction error.

Author Contributions

The author initiated the research, designed the study, proposed estimators, established a methodology, simulated data, analyzed findings, wrote a manuscript, underwent critical revision, and received final approval. The coauthor meticulously stages the research, from design to final approval, ensuring accuracy, clarity, and coherence of findings through an iterative process.

Funding

This study is self-funded.

Data Availability Statement

The data set is accessible through the R-Package “spdep.”

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this section, we give proofs of the main results.

Proof of Lemma 1: For the proof, we follow the approach of Yuzbasi et al [44], with a slight modification. Let

W \sim N_{p} (0, σ^{2} C)

and define

\begin{matrix} V_{n} (u) & = & \sum_{i = 1}^{n} [{(ϵ_{i} - u^{T} x_{i} / \sqrt{n})}^{2} - ϵ_{i}^{2}] + k \sum_{j = 1}^{p} [{(β_{j} + u_{j} / \sqrt{n})}^{2} - β_{j}^{2}] \\ V (u) & = & - 2 u^{T} W + u^{T} C u + 2 k_{o} u^{T} β, \end{matrix}

where

u = {(u_{1}, \dots, u_{p})}^{T}

. Following [36],

\begin{matrix} \sum_{i = 1}^{n} [{(ϵ_{i} - u^{T} x_{i} / \sqrt{n})}^{2} - ϵ_{i}^{2}] \overset{D}{\to} - 2 u^{T} W + u^{T} C u \end{matrix}

with finite-dimensional convergence holding trivially. Also

\begin{matrix} k \sum_{j = 1}^{p} [{(β_{j} + u_{j} / \sqrt{n})}^{2} - β_{j}^{2}] \overset{D}{\to} k_{o} \sum_{j = 1}^{p} u_{j} β_{j} . \end{matrix}

Thus

V_{n} (u) \overset{D}{\to} V (u)

with the finite-dimensional convergence holding trivially. Since

V_{n} (u)

is convex and

V (u)

has a unique minimum, it follows that

\begin{matrix} a r g m i n V_{n} (u) = \sqrt{n} ({\hat{β}}^{RF} - β) \\ \overset{D}{\to} \\ a r g m i n V (u) = C^{- 1} (W - k_{o} β) \sim N_{p} (- k_{o} C^{- 1} β, σ^{2} C^{- 1}) . \end{matrix}

It concludes

\begin{matrix} \sqrt{n} ({\hat{β}}^{RF} - β) \overset{D}{\to} N_{p} (- k_{o} C^{- 1} β, σ^{2} C^{- 1}) . \end{matrix}

Proof of Theorem 2: Because all of the pronounced estimators are special cases

{\hat{β}}_{1}^{Shrinkage}

, we give the bias of this estimator here. Then the proof follows by applying relevant

g (\cdot)

function in each estimator. Hence, we have

\begin{matrix} ADB ({\hat{β}}_{1}^{Shrinkage}) & = & ADB ({\hat{β}}_{1}^{UR}) - lim_{n \to \infty} \sqrt{n} E [({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] . \end{matrix}

Using part one of Lemma 1,

ADB ({\hat{β}}_{1}^{UR}) = - η_{11.2}

. Further using part three of Lemma 1 along with Theorem 1 in Appendix B of [29], we get

\begin{matrix} lim_{n \to \infty} \sqrt{n} E [({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] = δ E [g (χ_{p_{2} + 2}^{2} (Δ))] . \end{matrix}

Therefore, the asymptotic bias of the general shrinkage estimator is given by

\begin{matrix} ADB ({\hat{β}}_{1}^{Shrinkage}) = - η_{11.2} - δ E [g (χ_{p_{2} + 2}^{2} (Δ))] . \end{matrix}

The proof is complete considering the expressions for

E [g (χ_{p_{2} + 2}^{2} (Δ))]

given in Table A1.

Proof of Theorem 3: Similar to the proof of Theorem 2, we provide the ADQR of the shrinkage estimator

{\hat{β}}_{1}^{Shrinkage}

here. Then the proof follows by applying relevant

g (\cdot)

function in each estimator. Hence, we have

\begin{matrix} Γ ({\hat{β}}_{1}^{Shrinkage}, M) & = & Γ ({\hat{β}}_{1}^{UR}, M) \\ - & 2 lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - β)}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] \\ + lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR})}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g^{2} (T_{n})] \end{matrix}

From Lemma 1, we have

\begin{matrix} Γ ({\hat{β}}_{1}^{UR}, M) & = & t r \{M [c o v ({\hat{β}}_{1}^{UR})]\} \\ = & t r \{M [lim_{n \to \infty} n E ({\hat{β}}_{1}^{UR} - β_{1}) {({\hat{β}}_{1}^{UR} - β_{1})}^{T}]\} \\ = & t r \{M [lim_{n \to \infty} E (ϑ_{n}^{(1)} {ϑ_{n}^{(1)}}^{T})]\} \\ = & t r \{M [lim_{n \to \infty} c o v (ϑ_{n}^{(1)}) + E (ϑ_{n}^{(1)}) E {(ϑ_{n}^{(1)})}^{T}]\} \\ = & t r \{M [σ^{2} C_{11.2}^{- 1} + η_{11.2} η_{11.2}^{T}]\} \\ = & η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}) . \end{matrix}

From Lemma 1

\begin{matrix} lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - β)}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] & = & t r \{M [lim_{n \to \infty} E (ϑ_{n}^{(3)} {ϑ_{n}^{(1)}}^{T} g (T_{n}))]\} . \end{matrix}

Using double expectation, parts three and six of Lemma 1, and Theorems 1 & 3 in Appendix B of [29], we get

\begin{matrix} lim_{n \to \infty} E (ϑ_{n}^{(3)} {ϑ_{n}^{(1)}}^{T} g (T_{n})) & = & lim_{n \to \infty} E [E (ϑ_{n}^{(3)} {ϑ_{n}^{(1)}}^{T} g (T_{n})) | ϑ_{n}^{(3)}] \\ = & lim_{n \to \infty} E [ϑ_{n}^{(3)} E ({ϑ_{n}^{(1)}}^{T} g (T_{n})) | ϑ_{n}^{(3)}] \\ = & lim_{n \to \infty} E [ϑ_{n}^{(3)} {[- η_{11.2} + ϑ_{n}^{(3)} - δ]}^{T} g (T_{n}) | ϑ_{n}^{(3)}] \\ = & - lim_{n \to \infty} E [ϑ_{n}^{(3)} η_{11.2}^{T} g (T_{n})] + lim_{n \to \infty} E [ϑ_{n}^{(3)} {(ϑ_{n}^{(3)} - δ)}^{T} g (T_{n})] \\ = & - lim_{n \to \infty} E [ϑ_{n}^{(3)} g (T_{n})] η_{11.2}^{T} + lim_{n \to \infty} E [ϑ_{n}^{(3)} {ϑ_{n}^{(3)}}^{T} g (T_{n})] \\ - lim_{n \to \infty} E [ϑ_{n}^{(3)} g (T_{n})] δ^{T} \\ = & - δ η_{11.2}^{T} E [g (χ_{p_{2} + 2}^{2} (Δ))] + σ^{2} (C_{11.2}^{- 1} - C_{11}^{- 1}) E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ δ^{T} E [g (χ_{p_{2} + 4}^{2} (Δ))] - δ δ^{T} E [g (χ_{p_{2} + 2}^{2} (Δ))] \end{matrix}

Thus, it yields

\begin{matrix} lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - β)}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] & = & - η_{11.2}^{T} M δ \\ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] \\ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g (χ_{p_{2} + 4}^{2} (Δ))] \\ - & δ^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))] \end{matrix}

In a similar manner, we get

\begin{matrix} lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR})}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g^{2} (T_{n})] & = & t r \{M [lim_{n \to \infty} E (ϑ_{n}^{(3)} {ϑ_{n}^{(3)}}^{T} g (T_{n}))]\} \\ = & σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] \\ E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g^{2} (χ_{p_{2} + 4}^{2} (Δ))] \end{matrix}

Gathering all required expressions, we finally have

\begin{matrix} Γ ({\hat{β}}_{1}^{Shrinkage}, M) & = & η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}) \\ - 2 {- η_{11.2}^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g (χ_{p_{2} + 4}^{2} (Δ))] - δ^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))]} \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g^{2} (χ_{p_{2} + 4}^{2} (Δ))] \\ = & η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}) + 2 η_{11.2}^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] \{- 2 E [g (χ_{p_{2} + 2}^{2} (Δ))] + E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))]\} \\ + δ^{T} M δ \\ \times \{- 2 E [g (χ_{p_{2} + 4}^{2} (Δ))] + 2 E [g (χ_{p_{2} + 2}^{2} (Δ))] + E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))]\} . \end{matrix}

The proof is complete using Table A1.

References

Ahmed SE.(2014). Penalty, shrinkage and Pretest Strategies Variable Selection and estimation. Cham: Springer International Publishing 2014.
Ahmed SE, Bahadir Y. (2016). Big Data Analytics: Integrating Penalty Strategies. Big Data Analytics: Integrating Penalty Strategies. 11(2), 105–115.
Al-Momani M.(2023). Liu-type pretest and shrinkage estimation for the conditional autoregressive model. PLOS ONE. 18(4). [CrossRef]
Al-Momani M, Dawod AB.(2022). Model selection and post selection to improve the estimation of the arch model. Journal of Risk and Financial Management. 15(4), 174.
Al-Momani M, Hussein AA, Ahmed SE.(2016). Penalty and related estimation strategies in the spatial error model. Statistica Neerlandica. 71(1), 4–30.
Al-Momani M, Riaz M, Saleh MF. (2022). Pretest and shrinkage estimation of the regression parameter vector of the marginal model with multinomial responses. Statistical Papers. [CrossRef]
Arashi M, Kibria BMG, Norouzirad M, Nadarajah S.(2014). Improved preliminary test and Stein-rule Liu estimators for the ill-conditioned elliptical linear regression model. Journal of Multivariate Analysis. (126), 53–74.
Arashi M, Norouzirad M, Ahmed SE., Bahadir Y.(2018). Rank-based Liu Regression. Computational Statistics. 33(3),1525–1561.
Arashi M, Norouzirad M, Roozbeh M, Khan NM.(2021). A high-dimensional counterpart for the ridge estimator in multicollinear situations. Mathematics. 9(23),3057.
Arashi M, Roozbeh M.(2016). Some improved estimation strategies in high-dimensional semiparametric regression models with application to riboflavin production data. Statistical Papers. 60(3), 667–686.
Arashi M, Roozbeh M, Hamzah NA, Gasparini M.(2021). Ridge regression and its applications in genetic studies. PLOS ONE. 16(4). [CrossRef]
Bahadir Y, Ahmed SE, Ahmed F. (2023). Post-shrinkage strategies in statistical and machine learning for high dimensional data. S.l.: CRC PRESS 2023.
Bahadir Y, Ahmed SE, and Gungor M. (2017). Robust estimation approach for spatial error model. REVSTAT-Statistical Journal. 15 (2),251–276.
Bahadir Y, Arashi M, Ahmed SE.(2020). Shrinkage estimation strategies in Generalised Ridge Regression Models: Low/high-dimension regime. International Statistical Review. 88(1),229–251.
Baltagi BH, Lesage JP, Pace RK.(2017). Spatial econometrics: Qualitative and limited dependent variables. Bingley, UK: Emerald Group Publishing Limited. 2017.
Bivand R. (2022). R packages for Analyzing Spatial Data: A comparative case study with Areal Data. Geographical Analysis. 54 (3),488–518.
Boonstra PS, Mukherjee B, Taylor JM.(2015). A small-sample choice of the tuning parameter in ridge regression. Statistica Sinica. [CrossRef]
Cressie N. (1993). Statistics for Spatial Data. Nashville, TN: John Wiley & Sons 1993.
Cressie N, Wikle CK.(2011). Statistics for Spatio-Temporal Data. Chichester, England: Wiley-Blackwell 2011.
Dai X, Li E, Tian M.(2019). Quantile regression for varying coefficient spatial error models. Communications in Statistics - Theory and Methods. 50 (10),2382–2397.
Ehsanes S A K M. (2006). Theory of preliminary test and Stein-type estimation with applications. Hoboken (N.J.): Wiley-Interscience. 2006.
Ehsanes S A K M, Arashi M, Golam K B M (2019). Tank-based methods for shrinkage and selection: With application to machine learning. Hoboken (N.J.):John Wiley & amp; Sons, Inc. 2019.
Ehsanes S A K M, Arashi M, Saleh R A, Norouzirad M. (2022). Theory of ridge regression estimation with applications. Hoboken (N.J.):John Wiley & amp; Sons, Inc. 2019.
Gilley OW, Pace RK.(1996). On the Harrison and Rubinfeld data. Journal of Environmental Economics and Management. 31 (3),403–405.
Haining R. (2003). Spatial Data Analysis: Theory and Practice. Cambridge: Cambridge University Press; 2003.
Harrison D, Rubinfeld DL.(1978). Hedonic housing prices and the demand for Clean Air. Journal of Environmental Economics and Management. 5 (1),81–102.
Higazi SF, Abdel-Hady DH, Al-Oulfi SA.(2013). Application of spatial regression models to income poverty ratios in Middle Delta contiguous counties in Egypt. Pakistan Journal of Statistics and Operation Research. . 9 (1),93.
Hoerl AE, Kennard RW.(1970). A new Liu-type estimator in linear regression model. Technometrics. 12 (1),55–67.
Judge, George G., and M. E. Bock. (1970). The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics. North-Holland Pub. Co.. 1978.
Kejian L.(1993). A new class of blased estimate in linear regression. Communications in Statistics - Theory and Methods. 22 (2),393–402.
Li Y, Yang H.(2010). A new Liu-type estimator in linear regression model. Statistical Papers. 53 (2),427-437.
Lisawadi S, Ahmed SE, Reangsephet O.(2020). Post estimation and prediction strategies in negative binomial regression model. International Journal of Modelling and Simulation. 41 (6),463–477.
Liu R, Yu C, Liu C, Jiang J, Xu J.(2018). Impacts of haze on housing prices: An empirical analysis based on data from Chengdu (China). International Journal of Environmental Research and Public Health. 15 (6),1161.
Mardia KV, Marshall RJ. (1984). Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika. 71 (1),135–146.
Nkurunziza S, Al-Momani M, Lin EY. (2016). Shrinkage and lasso strategies in high-dimensional heteroscedastic models. Communications in Statistics - Theory and Methods. 45 (15),4454–4470.
Wenjiang Fu and Keith Knight. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics. 28 (5), 1356–1378.
Pace R K, Gilley, O W. (1997). Using the Spatial Configuration of the Data to Improve Estimation. Journal of Real Estate Finance and Economics. (14),333–340.
Piscitelli A. (2019). Spatial Regression of Juvenile Delinquency: Revisiting Shaw and McKay. International Journal of Criminal Justice Sciences. 14 (2),132–147.
Roozbeh M, Babaie–Kafaki S, Aminifard Z.(2021). Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models. Journal of Industrial and Management Optimization. 17 (6),3475.
Roozbeh M, Babaie-Kafaki S, Manavi M.(2022). A heuristic algorithm to combat outliers and multicollinearity in regression model analysis. Iranian Journal of Numerical Analysis and Optimization. 12 (1),173–186.
Solow AR. (1985). Bootstrapping correlated data. Journal of the International Association for Mathematical Geology. 17 (7),769–775.
Waller LA, Gotway CA. (2004). Applied Spatial Statistics for Public Health Data. Hoboken (N.J.):Wiley-Interscience. 2004.
Yildirim V, Mert K Y. (2020). Robust estimation approach for spatial error model. Journal of Statistical Computation and Simulation. 90 (9),1618–1638.
Yuzbasi, B., Arashi, M., and Ahmed, S. E. (2020). Shrinkage estimation strategies in generalized ridge regression models under low/high-dimension regime. International Statistical Review. 88 (1), 229-251.

Figure 1. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

ρ_{x} \in {0.3, 0.6, 0.9}

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 10)

Figure 1. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

ρ_{x} \in {0.3, 0.6, 0.9}

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 10)

Figure 2. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

ρ_{x} \in {0.3, 0.6, 0.9}

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 20)

Figure 2. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

ρ_{x} \in {0.3, 0.6, 0.9}

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 20)

Figure 3. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

ρ_{x} \in {0.3, 0.6, 0.9}

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 30)

Figure 3. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

ρ_{x} \in {0.3, 0.6, 0.9}

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 30)

Figure 4. Correlation Matrix for the Boston Housing Data.

Table 1. Full and Submodel

Selection Criterion	Model
Full	`log(CMEDV) = log(LSTAT)+I(RM^2) + TAX`
	`+B +log(RAD) + CHAS +CRIM + PTRATIO`
	`+ AGE+ LAT+ LON+log(RAD)+ I(NOX^2)`
	`+ log(DIS) + ZN+ INDUS`
Submodel	`log(CMEDV) = log(LSTAT)+ I(RM^2)+ TAX+ B+ CRIM`
	`+ PTRATIO`

Table 2. RE of the proposed estimators

Estimator	${\hat{β}}_{1}^{UR}$	${\hat{β}}_{1}^{RR}$	${\hat{β}}_{1}^{PTR}$	${\hat{β}}_{1}^{SR}$	${\hat{β}}_{1}^{PSR}$
$R E$	1.0198	2.9468	2.8624	2.3070	2.3287

Table A1. Expressions for the corresponding

g (\cdot)

functions in the proposed shrinkage estimators

Table A1. Expressions for the corresponding

g (\cdot)

functions in the proposed shrinkage estimators

Shrinkage estimator	$g (\cdot)$ function	$E [g (χ_{p_{2} + 2}^{2} (Δ))]$
${\hat{β}}_{1}^{PTR}$	$I (T_{n} \leq χ_{α, p_{2}}^{2})$	$H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ)$
${\hat{β}}_{1}^{SR}$	$(p_{2} - 2) T_{n}^{- 1}$	$(p_{2} - 2) E (χ_{p_{2} + 2}^{- 2} (Δ))$
${\hat{β}}_{1}^{PSR}$	$(1 - (p_{2} - 2) T_{n}^{- 1}) I (T_{n} \leq χ_{α, p_{2}}^{2})$	$H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ)$
		$- (p_{2} - 2)$
		$E [χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)]$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Ridge-Type Pretest and Shrinkage Estimation in Spatial Error Model: An Application to Housing Cost Data

Abstract

1. Introduction

2. Spatial Error Model

3. Maximum Likelihood Estimation

4. Materials and Methods: Developing Pretest and Shrinkage Ridge Estimation Strategies

4.1. Full and Reduced Models Ridge Estimators

4.2. Pretest, Shrinkage, and Positive Shrinkage Ridge Estimators

5. Asymptotic Analysis

6. Numerical Analysis

6.1. Simulation Experiments

6.2. Data Example

7. Conclusion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

MDPI Initiatives

Important Links

Subscribe