Overview of High-dimensional Measurement Error Regression Models

Preprint

Review

Overview of High-dimensional Measurement Error Regression Models

Altmetrics

Downloads

166

Views

Comments

A peer-reviewed article of this preprint also exists.

Jingxuan Luo,Lili Yue,

Gaorong Li^*

Jingxuan Luo,Lili Yue,

Gaorong Li^*

This version is not peer-reviewed

Submitted:

15 June 2023

Posted:

16 June 2023

You are already at the latest version

Alerts

Abstract

High-dimensional measurement error data are becoming more prevalent across various fields. Research on measurement error regression models has gained increasing attention due to the risk of drawing inaccurate conclusions if measurement errors are ignored. When the dimension p is larger than the sample size n, it is challenging to develop statistical inference methods for high-dimensional measurement error regression models due to the existence of bias, nonconvexity of objective function, high computational cost and many other difficulties. Over the past few years, some works have overcome the aforementioned difficulties and proposed several statistical inference methods. This paper mainly reviews the current development on estimation, hypothesis testing and variable screening methods for high-dimensional measurement error regression models and shows the theoretical results of these methods with some directions worthy of exploring for future research.

Keywords:

Subject: Computer Science and Mathematics - Probability and Statistics

1. Introduction

Measurement error data inevitably exists in applications and has raised significant concerns in various fields including biology, medicine, epidemiology, economics, finance and remote sensing. So far, there have been a wealth of research achievements on classical low-dimensional measurement error regression models under various assumptions. Numerous studies focus on parameter estimation for low-dimensional measurement error regression models, with primary techniques listed below: (1) corrected regression estimation methods [1]; (2) Simulation-Extrapolation (SIMEX) estimation methods [2,3]; (3) deconvolution methods [4]; (4) corrected empirical likelihood methods [5,6]. For more detailed discussions on other estimation and hypothesis testing methods for classical low-dimensional measurement error models, please refer to the literature [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29], as well as the monographs [30,31,32,33,34,35].

As one of the most popular research fields in statistics, high-dimensional regression has been widely used in various fields including genetics, economics, medical imaging, meteorology and sensor networks. Over the past two decades, various high-dimensional regression methods have been widely proposed such as Lasso [36], smoothly clipped absolute deviation (SCAD) [37], Elastic Net [38], Adaptive Lasso [39], Dantzig Selector [40], smooth integration of counting and absolute deviation (SICA) [41], minimax concave penalty (MCP) [42], and among many others. These methods have been widely applied to estimate regression coefficients while also achieving the goal of variable selection by adding penalties to objective functions, please refer to the literature review [43,44,45], as well as the monographs [46,47,48].

For the variable screening methods of ultrahigh-dimensional regression models where dimension p and sample size n satisfying

log p = O (n^{κ}), κ > 0

, Fan and Lv [49] proposed the sure independence screening (SIS) method, which is a pioneering method in this field. For the estimation and variable selection of ultrahigh-dimensional regression models, it is suggested applying SIS method for variable screening first. Then, based on the variables screened in the first step, we can utilize regularization methods with penalties to estimate the regression coefficients and identify the significant variables simultaneously. Due to the operability and effectiveness of SIS method in applications, numerous works have been done to extend the method, see [50,51,52,53,54,55,56,57,58,59].

However, most of the aforementioned theories and applications for high-dimensional regression models focused on clean data. In the era of big data, researchers frequently collect high-dimensional data with measurement errors. Typical instances include gene expression data [61] and sensor network data [60]. The imprecise measurements are the result of poorly managed and defective data collection processes, as well as the imprecise measuring instruments. It is well known that ignoring the influence of measurement errors will result in biased estimators and erroneous conclusions. Therefore, developing statistical inference methods for high-dimensional measurement error regression models have drawn a lot of interest.

Based on the types of measurement errors, research on high-dimensional measurement error regression models can be divided into the following three categories: covariates containing measurement errors; response variables containing measurement errors; both covariates and response variables containing measurement errors. In this paper, we mainly focus on the category that covariates contain measurement errors. When the dimension p is larger than the sample size n, parameter estimation can be challenging due to the nonconvexity of penalized objective function caused by correction for the bias. This further makes it impossible to obtain the optimal solution of optimization problem. We utilize the following linear regression model to illustrate this problem

y = X β + ε,

(1)

where

y = {(y_{1}, \dots, y_{n})}^{T} \in R^{n}

is the

n \times 1

response vector,

X = {(X_{1}, \dots, X_{n})}^{T} \in R^{n \times p}

is the

n \times p

fixed design matrix with

X_{i} = {(x_{i 1}, \dots, x_{i p})}^{T}

β = {(β_{1}, \dots, β_{p})}^{T} \in R^{p}

is the sparse regression coefficient vector with only s nonzero components, and assume that model error vector

ε = {(ε_{1}, \dots, ε_{n})}^{T} \in R^{n}

is independent of

X

. In order to obtain a sparse estimator of the true regression coefficient vector

β_{0} = {(β_{01}, \dots, β_{0 p})}^{T} \in R^{p}

, we can minimize the following penalized least square objective function

\frac{1}{2 n} {∥ y - X β ∥}_{2}^{2} + {∥ p_{λ} (β) ∥}_{1},

(2)

which is equivalent to minimizing

\frac{1}{2} β^{T} Σ β - ρ^{T} β + {∥ p_{λ} (β) ∥}_{1},

(3)

where

Σ = n^{- 1} X^{T} X

ρ = n^{- 1} X^{T} y

p_{λ} (\cdot)

is a penalty function with regularization parameter

λ \geq 0

. If the covariates matrix

X

can be precisely measured, the penalized objective functions (2) and (3) are convex. Thus, we can obtain a sparse estimator of

β_{0}

by minimizing the penalized objective function (2) or (3).

However, it is common that the covariates matrix

X

cannot be accurately observed in practice. Let

W = {(W_{1}, \dots, W_{n})}^{T} = {(w_{i j})}_{n \times p}

be the observed covariates matrix with additive measurement errors satisfying

W = X + U

, where

U = {(U_{1}, \dots, U_{n})}^{T}

is the matrix of measurement errors,

U_{i} = {(u_{i 1}, \dots, u_{i p})}^{T}

follows a sub-Gaussian distribution with mean zero and covariance matrix

Σ_{u}

, and it is assumed to be independent of

(X, y)

. To reduce the influence of measurement errors, Loh and Wainwright [62] proposed to replace

Σ

and

ρ

in the penalized objective function (3) by their consistent estimators

\hat{Σ} = n^{- 1} W^{T} W - Σ_{u}

and

\tilde{ρ} = n^{- 1} W^{T} y

, respectively. Then we can obtain the sparse estimator of

β_{0}

by minimizing the following penalized objective function

\frac{1}{2} β^{T} \hat{Σ} β - {\tilde{ρ}}^{T} β + {∥ p_{λ} (β) ∥}_{1} .

(4)

Note that when the dimension p is fixed or smaller than the sample size n, it can be guaranteed that

\hat{Σ}

is a positive definite or semi positive-definite matrix. It further ensures that the penalized objective function (4) remains convex. Thus, the global optimal solution of

β

can be obtained by minimizing the penalized objective function (4).

However, for high-dimensional or ultrahigh-dimensional regression models, i.e.,

p > n

p ≫ n

, there are two key problems: (i) the penalized objective function (4) is no longer convex and unbounded from below because the corrected estimator

\hat{Σ}

Σ

is no longer a semi-positive definite matrix. This further makes it impossible to obtain the estimator of

β_{0}

by minimizing the penalty objective function (4); (ii) In order to construct an objective function similar to that of standard Lasso and solve the corresponding optimization problem using R package “glmnet” or “lars”, it is necessary to decompose

\hat{Σ}

by Cholesky decomposition method and obtain the substitution of response vector and covariates matrix. However, this process results in an error accumulation and makes it challenging to guarantee valid theoretical results, and please see the detailed discussions in [63,64].

For problem (i), Loh and Wainwright [62] changed the unconstrained optimization problem into a constrained optimization problem by adding restrictions to

β

. They suggested applying the projected gradient descent algorithm to solve the restricted optimization problem and acquire the global optimal solution of true regression coefficient vector

β_{0}

. Nevertheless, the penalized objective function of the optimization problem is still nonconvex. To address this issue, Datta and Zou [63] suggested substituting

\hat{Σ}

by its semi-positive definite projection matrix

\tilde{Σ}

, and they proposed convex conditioned Lasso (CoCoLasso). Further, Zheng et al. [64] introduced a balanced estimation that prevented overfitting while maintaining the estimation accuracy by combining

l_{1}

and concave penalty. Tao et al. [65] constructed a modified least-squares loss function using a semi-positive definite projection matrix for estimated covariance matrix and proposed calibrated zero-norm regularized least squares (CaZnRLS) estimation of regression coefficients. Rosenbaum and Tsybakov [66,67] proposed a matrix uncertainty (MU) selector and its improved version compensated MU selector for high-dimensional linear models with additive measurement errors in covariates. Sørensen et al. [68] extended MU selector to generalized linear models and developed the generalized matrix uncertainty (GMU) selector. Sørensen et al. [69] showed the theoretical results of relevant variable selection methods. Based on MU selector, Belloni et al. [70] introduced an estimator that can achieve the minimax efficiency bound. They proved that the corresponding optimization problem can be converted into a second-order cone programming problem, which can be solved in polynomial time. Romeo and Thoresen [71] evaluated the performance of MU selector in [66], nonconvex Lasso in [62], and CoCoLasso in [63] using simulation studies. Brown et al. [72] proposed a path-following iterative algorithm called Measurement Error Boosting (MEBoost), which is a computationally effective method for variable selection in high-dimensional measurement error regression models. Nghiem and Potgieter [73] introduced a new estimation method called simulation-selection-extrapolation (SIMSELEX), which used Lasso in the simulation step and group Lasso in the selection step. Jiang and Ma [74] drew on the idea of nonconvex Lasso in [62] and proposed an estimator of the regression coefficients for high-dimensional Poisson models with measurement errors. Byrd and McGee [75] developed an iterative estimation method for high-dimensional generalized linear models with additive measurement errors based on the imputation-regularized optimization (IRO) algorithm in [76]. However, the error accumulation issue mentioned in problem (ii) has not been addressed in the literature.

The aforementioned works place more emphasis on estimation and variable selection problems rather than hypothesis testing. For high-dimensional regression models with clean data, research on hypothesis testing problems has made significant progress under various settings in [77,78,79,80,81,82,83,84]. For high-dimensional measurement error models, the hypothesis testing methods are equally crucial. However, the bias and instability caused by measurement errors make hypothesis testing extremely difficult. Recently, some progress has been achieved in statistical inference methods. Based on multiplier bootstrap, Belloni [85] constructed simultaneous confidence intervals for the target parameters in high-dimensional linear measurement error models. Focused on the case where a fixed number of covariates contain measurement errors, Li et al. [86] proposed a corrected decorrelated score test for parameters corresponding to the error-prone covariates and created asymptotic confidence intervals for them. Huang et al. [87] proposed a new variable selection method based on debiased CoCoLasso and proved that it can achieve false discovery rate (FDR) control. Jiang et al. [88] developed Wald and score tests for high-dimensional Poisson measurement error models.

Compared to the above estimation and hypothesis testing methods, the screening techniques for ultrahigh-dimensional measurement error models is relatively few. Nghiem et al. [89] introduced two screening methods named corrected penalized marginal screening (PMSc) and corrected sure independence screening (SISc) for ultrahigh-dimensional linear measurement error models.

This paper gives an overview of the estimation and hypothesis testing methods for high-dimensional measurement error regression models, as well as the variable screening methods for ultrahigh-dimensional measurement error models. The rest of this paper is organized as follows. In Section 2, we review some estimation methods for linear models. We survey the estimation methods for generalized linear models in Section 3. Section 4 presents the recent advances in hypothesis testing methods for high-dimensional measurement error models. Section 5 introduces the variable screening techniques for ultrahigh-dimensional linear measurement error models. We conclude the paper with some discussions in Section 6.

Notations. Let

S^{p}

be the set of all

p \times p

real symmetric matrices and

S_{+}^{p}

be the subset of

S^{p}

containing all positive semi-definite matrix in

S^{p}

. We use

| A |

to denote the cardinality of set

A

. Let

S = {j : β_{0 j} \neq 0, j = 1, \dots, p}

be the index set of nonzero parameters. For a vector

a = (a_{1}, \dots, a_{m}) \in R^{m}

, let

{∥ a ∥}_{q} = (\sum_{ℓ = 1}^{m} | a_{ℓ} {|^{q})}^{1 / q}, 1 \leq q < \infty

denote its

l_{q}

norm, and write

{∥ a ∥}_{\infty} = {max}_{1 \leq ℓ \leq m} | a_{ℓ} |

. Denote

a_{A} \in R^{| A |}

the subvector of

a

with index set

A \subset {1, \dots, m}

. Denote by

e

the vector of all ones. For a matrix

B = (b_{i j})

, let

{∥ B ∥}_{1} = {max}_{j} \sum_{i} |b_{i j}|, {∥ B ∥}_{max} = {max}_{i, j} |b_{i j}|

and

{∥ B ∥}_{\infty} = {max}_{i} \sum_{j} |b_{i j}| .

For constants a and b, define

a \lor b = max {a, b}

. We use c and C to denote positive constants that may vary throughout the paper. Finally, let

\overset{d}{\to}

denote convergence in distribution.

2. Estimation Methods for Linear Models

This section mainly focuses on the linear model (1) with high-dimensional settings where the dimension p is larger than the sample size n. When the data can be observed precisely, we can estimate the true regression coefficient vector

β_{0}

by minimizing the penalized objective function (2) or (3). However, we frequently come across cases where the measured covariates contain measurement errors. There are various types of measurement error data, and we primarily focus on the two categories below.

(1) Covariates with additive errors. The observed error-prone of covariate

W_{i} = X_{i} + U_{i}

, where the measurement error

U_{i}

is independent of

X_{i}

and independently generated from a distribution with mean zero and known covariance matrix

Σ_{u}

(2) Covariates with multiplicative errors. The observed error-prone of covariates

W_{i} = X_{i} ⊙ M_{i}

, where ⊙ denotes the Hadamard product, the measurement error

M_{i}

is independent of

X_{i}

and follows from a distribution with mean

μ_{M}

and known covariance matrix

Σ_{M}

Our main goal is to obtain the sparse estimator

\hat{β}

of true regression coefficient vector

β_{0}

in the presence of measurement errors. As we introduced in Section 1, we will run into the issue of the penalized objective function being nonconvex and unbounded from below after correcting the bias caused by measurement errors. This prevents us solving the optimization problem. Several works focused on this issue and proposed some estimation methods.

2.1. Nonconvex Lasso

In order to resolve the issue of objective function being unbounded from below and unsolvable in the presence of measurement errors, Loh and Wainwright [62] added restrictions to regression coefficients

β

and adopted

l_{1}

penalty. Then the estimator of

β_{0}

can be obtained by the following

l_{1}

-constrained quadratic program

{\hat{β}}_{NCL} \in arg min_{{∥ β ∥}_{1} \leq c_{0} \sqrt{s}} \{\frac{1}{2} β^{T} \hat{Σ} β - {\tilde{ρ}}^{T} β + λ {∥ β ∥}_{1}\} = : arg min_{{∥ β ∥}_{1} \leq c_{0} \sqrt{s}} \{L (β) + {λ ∥ β ∥}_{1}\},

(5)

where

c_{0} > 0

is a constant,

s = | S |

denotes the number of nonzero components of

β_{0}

L (β) = 2^{- 1} β^{T} \hat{Σ} β - {\tilde{ρ}}^{T} β

is the loss function,

\hat{Σ}

and

\tilde{ρ}

are the consistent estimators of covariance matrix

Σ

X_{i}

and marginal correlation coefficient vector

ρ

(X_{i}, y_{i})

, and they may differ in terms of various kinds of measurement error data. Under the additive error setting,

{\hat{Σ}}_{add} = n^{- 1} W^{T} W - Σ_{u}, {\tilde{ρ}}_{add} = n^{- 1} W^{T} y .

(6)

Under the multiplicative error setting,

{\hat{Σ}}_{mul} = n^{- 1} W^{T} W ⊘ (Σ_{m} + μ_{m} μ_{m}^{T}), {\tilde{ρ}}_{mul} = n^{- 1} W^{T} y ⊘ μ_{m},

(7)

where ⊘ denotes elementwise division operator, and let

\hat{Σ} = {\hat{Σ}}_{add}

{\hat{Σ}}_{mul}

throughout the sequel. The reason for using “∈” rather than “=” in (5) is that several local minima might exist in the objective function. Note that this method still relies on a nonconvex objective function to obtain the estimator of

β_{0}

. Thus, we refer to it as “nonconvex Lasso”.

The nonconvexity of the penalized objective function makes it challenging to obtain the global minimum of the optimization problem (5). To solve the optimization problem (5), Loh and Wainwright [62] applied the projected gradient descent algorithm and demonstrated that even if the penalized objective function is nonconvex, the solution produced by this algorithm can reach the global minimum with high probability. The algorithm finds the global minimum in an iterative way as follows. At

(k + 1)

th iteration,

β_{NCL}^{(k + 1)} = arg min_{{∥ β ∥}_{1} \leq c_{0} \sqrt{s}} \{L (β_{NCL}^{(k)}) + \nabla L {(β_{NCL}^{(k)})}^{T} (β - β_{NCL}^{(k)}) + \frac{η}{2} ∥ β - β_{NCL}^{(k)} ∥_{2}^{2} + λ {∥ β ∥}_{1}\},

(8)

where

\nabla L (β) = \hat{Σ} β - \tilde{ρ}

is the gradient of loss function

L (β)

η > 0

denotes the stepsize parameter. For details of this algorithm, please see [62,90,91,92]. Loh and Wainwright [62] proved that the solution obtained by iteration (8) is quite near to the global minimum in both

l_{1}

-norm and

l_{2}

-norm under some conditions. Specifically, for all

t \geq 0

\begin{matrix} ∥ β_{NCL}^{(k)} - {\hat{β}}_{NCL} ∥_{2}^{2} & \leq γ^{k} ∥ β_{NCL}^{(0)} - {\hat{β}}_{NCL} ∥_{2}^{2} + C_{1} \frac{log p}{n} ∥ {\hat{β}}_{NCL} - β_{0} ∥_{1}^{2} + C_{2} {∥ {\hat{β}}_{NCL} - β_{0} ∥}_{2}^{2}, \\ ∥ β_{NCL}^{(k)} - {\hat{β}}_{NCL} ∥_{1} & \leq 2 \sqrt{k} ∥ β_{NCL}^{(k)} - {\hat{β}}_{NCL} ∥_{2} + 2 \sqrt{k} ∥ {\hat{β}}_{NCL} - β_{0} ∥_{2} + 2 {∥ {\hat{β}}_{NCL} - β_{0} ∥}_{1}, \end{matrix}

where

C_{1}

and

C_{2}

are positive constants,

γ \in (0, 1)

is a contraction coefficient independent of

(n, p, k)

. For the estimator

{\hat{β}}_{NCL}

of the true regression coefficient vector

β_{0}

, Loh and Wainwright [62] showed that, with any

c_{0} \geq {∥ β_{0} ∥}_{2}

and

λ = O (\sqrt{log p / n})

, the

l_{q}

-estimation error of

{\hat{β}}_{NCL}

satisfies the bounds

∥ {\hat{β}}_{NCL} - β_{0} ∥_{q} = O (s^{1 / q} \sqrt{\frac{log p}{n}}), q = 1, 2 .

When

q = 1

l_{1}

-estimation error can reach the convergence rate

s \sqrt{log p / n}

; when

q = 2

l_{2}

-estimation error can reach the convergence rate

\sqrt{s log p / n}

. However, Loh and Wainwright [62] did not establish the variable selection consistency and oracle inequality for prediction error of nonconvex Lasso estimator.

2.2. Convex Conditioned Lasso

Nonconvex Lasso [62] overcomes the problem of unsolvability caused by nonconvex objective function in the presence of measurement errors. However, there are some drawbacks to this method. First, the nonconvex Lasso solves the problem by adding constraints to

β

, but the penalized objective function remains nonconvex. It is well recognized that the convexity of the penalized objective function will be incredibly useful for theoretical analysis and computation. Second, two important unknown parameters

c_{0}

and s are included in the optimization problem (5). These two parameters have a direct impact on the estimation results, but we are not sure about their magnitudes in applications. Third, Loh and Wainwright [62] have not established the variable selection results of nonconvex Lasso estimator. To remedy these issues, Datta and Zou [63] proposed Convex Conditioned Lasso (CoCoLasso) based on a convex objective function, which possesses computational and theoretical superiority brought by convexity.

In order to construct the convex objective function, Datta and Zou [63] introduced a nearest positive semi-definite matrix projection operator for the square matrix, which is defined as

{(A)}_{+} = arg min_{A_{1} \geq 0} {∥ A - A_{1} ∥}_{max},

(9)

where

A

is a square matrix. Let

\tilde{Σ} = {(\hat{Σ})}_{+}

, and the alternating direction method of multipliers (ADMM) algorithm [93] can be utilized to derive

\tilde{Σ}

from

\hat{Σ}

. Based on

\tilde{Σ}

, the following convex objective function can be constructed, and it yields CoCoLasso estimator

{\hat{β}}_{coco} = arg min_{β} \{\frac{1}{2} β^{T} \tilde{Σ} β - {\tilde{ρ}}^{T} β + λ {∥ β ∥}_{1}\} .

(10)

When the covariates contain additive measurement errors,

{\tilde{Σ}}_{add} = {({\hat{Σ}}_{add})}_{+}, {\tilde{ρ}}_{add} = n^{- 1} W^{T} y, {\hat{Σ}}_{add} = n^{- 1} W^{T} W - Σ_{u} .

(11)

When the covariates contain multiplicative measurement errors,

{\tilde{Σ}}_{mul} = {({\hat{Σ}}_{mul})}_{+}, {\tilde{ρ}}_{mul} = n^{- 1} W^{T} y ⊘ μ_{m}, {\hat{Σ}}_{mul} = n^{- 1} W^{T} W ⊘ (Σ_{m} + μ_{m} μ_{m}^{T}) .

(12)

Note that

\tilde{Σ}

not only contributes to the construction of the convex objective function but also possesses the same level of estimation accuracy as

\hat{Σ}

in [62]. It can be guaranteed by the following equation

∥ \tilde{Σ} {- Σ ∥}_{max} \leq ∥ \tilde{Σ} - \hat{Σ} ∥_{max} + ∥ \hat{Σ} {- Σ ∥}_{max} \leq 2 {∥ \hat{Σ} - Σ ∥}_{max} .

Since

\tilde{Σ}

is semi-positive definite, we can perform Cholesky decomposition on

\tilde{Σ}

. Then, the cholesky factor of

\tilde{Σ}

can be used to simplify computations by rewriting (10) as

{\hat{β}}_{coco} = arg min_{β} \frac{1}{2 n} ∥ \tilde{y} - \tilde{W} {β ∥}_{2}^{2} + λ {∥ β ∥}_{1},

(13)

where

\tilde{W}

denotes Cholesky factor of

\tilde{Σ}

satisfying

n^{- 1} {\tilde{W}}^{T} \tilde{W} = \tilde{Σ}

, and

\tilde{y}

is the vector satisfying

n^{- 1} {\tilde{W}}^{T} \tilde{y} = \tilde{ρ}

. The penalized objective function in (13) is similar to that of standard Lasso. Thus, we can utilize the coordinate descent algorithm to obtain CoCoLasso estimator, please see the details in [63,94,95]. Theoretically, Datta and Zou [63] established the

l_{q}

-estimation

(q = 1, 2)

and prediction error bounds of CoCoLasso estimator. Suppose that

ψ = min_{δ \neq 0, ∥ δ_{S^{c}} ∥_{1} \leq 3 {∥ δ_{S} ∥}_{1}} \frac{δ^{T} Σ δ}{{∥ δ ∥}_{2}^{2}} > 0 .

For

s \sqrt{ζ log p / n} < λ \leq min {ϵ_{0}, 12 ϵ_{0} ∥ β_{0 S} ∥_{\infty}}

, where

ζ = max {σ_{ε}^{4}, σ_{U}^{4}, 1}

ϵ_{0} = σ_{U}^{2}

σ_{ε}^{2}

and

σ_{U}^{2}

are sub-Gaussian parameters of model error and measurement error, respectively, CoCoLasso estimator

{\hat{β}}_{coco}

satisfies that with probability at least

1 - C exp (- c log p)

\begin{matrix} ∥ {\hat{β}}_{coco} - β_{0} ∥_{q} = & O (\frac{λ s^{1 / q}}{ψ}), q = 1, 2, \end{matrix}

(14)

\begin{matrix} n^{- 1 / 2} {∥ X ({\hat{β}}_{coco} - β_{0}) ∥}_{2} = & O (λ \sqrt{\frac{s}{ψ}}) . \end{matrix}

(15)

The fomulas (14) and (15) show the oracle inequalities for

l_{q}

-estimation error with

q = 1, 2

and prediction error. Further, Datta and Zou [63] established the sign consistency of CoCoLasso estimator under additional irrepresentable condition and minimum signal strength condition. While there was no variable selection result provided for the nonconvex Lasso estimator

{\hat{β}}_{NCL}

in [62]. Thus, CoCoLasso estimation method not only enjoys the computational convenience of convexity, but also possesses excellent theoretical results. However, when the dimension of covariates p is large, the computation of

\tilde{Σ}

is expensive. To improve the computational efficiency, Escribe et al. [96] applied a two-step block descent algorithm and proposed a block coordinate descent convex conditioned Lasso (BDCoCoLasso), which is designed for the case that the covariate matrix is only partially corrupted.

2.3. Balanced Estimation

CoCoLasso is effective in parameter estimation of high-dimensional measurement error models, but it suffers from overfitting. To overcome this drawback, Zheng et al. [64] replaced Lasso penalty in CoCoLasso with the combined

l_{1}

and concave penalty and developed the balanced estimator, which can be obtained by

{\hat{β}}_{bal} = arg min_{β} \{\frac{1}{2} β^{T} \tilde{Σ} β - {\tilde{ρ}}^{T} β + λ_{0} {∥ β ∥}_{1} + {∥ p_{λ} (β) ∥}_{1}\},

(16)

where

λ_{0} = c_{1} \sqrt{log p / n}

is the regularization parameter for the

l_{1}

penalty with

c_{1}

being a positive constant,

p_{λ} (β) = [p_{λ} (| β_{1} |), \dots, p_{λ} (| β_{p} {|)]}^{T}

, and

p_{λ} (u), u \in [0, + \infty)

is a concave penalty function with the tuning parameter

λ \geq 0

. The definitions of

\tilde{Σ}

and

\tilde{ρ}

are the same as those in (11) and (12) with the two kinds of measurement error data. In contrast to CoCoLasso estimator, balanced estimator strikes a perfect balance between prediction and variable selection. Surprisingly, excellent variable selection results promote the estimation and prediction accuracy of balanced estimator. The simulation studies in [64] demonstrate the estimation and prediction accuracy, as well as the better variable selection results of balanced estimator. As for asymptotic properties of

{\hat{β}}_{bal}

, Zheng et al. [64] established the oracle inequalities for

l_{q}

-estimation and prediction error,

\begin{matrix} ∥ {\hat{β}}_{bal} - β_{0} ∥_{q} = & O_{p} (\frac{λ_{0} s^{1 / q}}{ϕ^{2}}), q = 1, 2, \end{matrix}

(17)

\begin{matrix} n^{- 1 / 2} {∥ X ({\hat{β}}_{bal} - β_{0}) ∥}_{2} = & O_{p} (\frac{λ_{0} \sqrt{s}}{ϕ}), \end{matrix}

(18)

where

ϕ = min_{δ \neq 0, ∥ δ_{S^{c}} ∥_{1} \leq 7 {∥ δ_{S} ∥}_{1}} \frac{n^{- 1 / 2} {∥ X δ ∥}_{2}}{∥ δ_{S} ∥_{2} \lor {∥ δ_{S^{c}}^{*} ∥}_{2}} > 0,

and

δ_{S^{c}}^{*} \in R^{s}

contains the s largest absolute vaules of

δ_{S^{c}}

. It can be seen from (17) and (18) that the bounds of

l_{q}

-estimation (

q = 1, 2

) and prediction error are free of regularization parameter

λ

for the concave penalty. Also, the upper bound of falsely discovered signs is provided in [64]. Denote

FS (\hat{β}) = | {1 \leq j \leq p : sgn ({\hat{β}}_{j}) \neq sgn (β_{0, j})} |

, then

FS (\hat{β}) = O_{p} (\frac{λ_{0}^{2} s}{λ^{2} ϕ^{4}}) .

(19)

From (19), we can see that if

{min}_{j \in S} | β_{0 j} | ≫ \sqrt{s log p / n}

such that

λ^{2} ≫ λ_{0}^{2} s

, balanced estimator can achieve sign consistency, which is stronger than the variable selection consistency. Compared with balanced estimator, CoCoLasso estimator requires additional irrepresentable condition to achieve this property.

2.4. Calibrated Zero-norm Regularized Least Square Estimation

The nearest positive semi-definite matrix projection operator defined in [63] solves the problem that the penalized objective function is nonconvex in high-dimensional measurement error models. However, with the constraint of the positive semi-definite matrix, the computation cost of

\tilde{Σ}

is high. Tao et al. [65] demonstrated that as the dimension p increases, the time required to calculate

\tilde{Σ}

using the ADMM algorithm will increase significantly. Thus, Tao et al. [65] suggested replacing

\tilde{Σ}

with an approximation of

\hat{Σ}

that is easy to obtain but less precise. To achieve this purpose, consider the eigendecomposition of

\hat{Σ}

as follows

\hat{Σ} = V diag (θ_{1}, \dots, θ_{p}) V^{T},

where

diag (θ_{1}, \dots, θ_{p})

is a diagonal matrix containing the eigenvalues of

\hat{Σ}

with

θ_{1} \geq θ_{2} \geq \dots \geq θ_{p}

V \in R^{p \times p}

is an orthonormal matrix consisting of the corresponding eigenvectors. Then, Tao et al. [65] substituted Frobenius norm for elementwise maximum norm in (9) and obtained a positive definite approximation of

\hat{Σ}

as follows

{\tilde{Σ}}_{F} = arg min_{W \geq ξ I} {∥ \hat{Σ} - W ∥}_{F} for some ξ > 0 .

(20)

Note that the optimal solution of (20) is the same as that of the problem

min_{W \geq ξ I} {∥ \hat{Σ} - W ∥}_{F}^{2} .

(21)

Thus, we have

{\tilde{Σ}}_{F} = ξ I + Π_{S_{+}^{p}} (\hat{Σ} - ξ I) = V diag [max (θ_{1}, ξ), \dots, max (θ_{p}, ξ)] V^{T},

(22)

where

Π_{S_{+}^{p}} (\cdot)

denotes the projection of a matrix on

S_{+}^{p}

. Similar to

\tilde{Σ}

, we have

{\tilde{Σ}}_{F} = n^{- 1} {\tilde{W}}_{F}^{T} {\tilde{W}}_{F}

, where

n^{- 1 / 2} {\tilde{W}}_{F}

is Cholesky factor of

{\tilde{Σ}}_{F}

. Let

{\tilde{y}}_{F}

be the vector satisfying

n^{- 1} {\tilde{W}}_{F}^{T} {\tilde{y}}_{F} = \tilde{ρ}

. By some simple calculation, we can obtain that

\{\begin{matrix} {\tilde{W}}_{F} & = \sqrt{n} V diag (\sqrt{max (θ_{1}, ξ)}, \dots, \sqrt{max (θ_{p}, ξ)}) V^{T}, \\ {\tilde{y}}_{F} & = \sqrt{n} V diag (\frac{1}{\sqrt{max (θ_{1}, ξ)}}, \dots, \frac{1}{\sqrt{max (θ_{p}, ξ)}}) V^{T} \tilde{ρ} . \end{matrix}

(23)

Based on equation (22),

{\tilde{Σ}}_{F}

can be obtained easily. This implies that computing

{\tilde{Σ}}_{F}

requires substantially less time than computing

\tilde{Σ}

. However, the approximation accuracy of

{\tilde{Σ}}_{F}

\hat{Σ}

is not as good as that of

\tilde{Σ}

because minimizing Frobenius norm may yield larger components compared with the elementwise maximum norm. To get an excellent estimator of

β_{0}

, it is reasonable to find a more effective regression method to replace Lasso. Tao et al. [65] considered the zero norm penalty and defined the following calibrated zero-norm regularized least squares (CaZnRLS) estimator

{\hat{β}}_{zn} \in arg min_{β \in R^{p}} \{\frac{1}{2 n λ} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + {∥ β ∥}_{0}\} .

(24)

However, it is difficult to solve (24) directly. Thus, to give an equivalent form for (24) that can be solved, Tao et al. [65] defined

ϕ (u) : = \frac{a - 1}{a + 1} u^{2} + \frac{2}{a + 1} u (a > 1), u \in R .

It is easy to verify that for any

β \in R^{p}

{∥ β ∥}_{0} = min_{w \in R^{p}} \{\sum_{i = 1}^{p} ϕ (w_{i}) : {(e - w)}^{T} | β | = 0, 0 \leq w \leq e\},

(25)

where

| β | = (| β_{1} |, \dots, | β_{p} {|)}^{T}

. The formula (25) implies that the optimization problem (24) can be rewritten as the following mathematical program with equilibrium constraints (MPEC)

min_{β, w \in R^{p}} \{\frac{1}{2 n λ} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + \sum_{i = 1}^{p} ϕ (w_{i}) : {(e - w)}^{T} | β | = 0, 0 \leq w \leq e\} .

(26)

Note that if the optimal solution of optimization problem (24) is

{\hat{β}}^{*}

, then the corresponding optimal solution of optimization problem (26) is

({\hat{β}}^{*}, sign (| {\hat{β}}^{*} |))

However, it can be seen that the annoying nonconvexity is introduced by the restriction

{(e - w)}^{T} | β | = 0

in (26), and it is the cause of the difficulty in obtaining the estimator

{\hat{β}}_{zn}

. Accordingly, Tao et al. [65] considered the following penalized version of optimization problem (26)

min_{β, w \in R^{p}} \{\frac{1}{2 n λ} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + \sum_{i = 1}^{p} ϕ (w_{i}) + ρ {(e - w)}^{T} | β |, 0 \leq w \leq e\},

(27)

where

ρ > 0

is the penalty parameter. Tao et al. [65] proved that the global optimal solution of optimization problem (27) with

ρ \geq \bar{ρ} : = (4 a L_{f}) {[(a + 1) λ]}^{- 1}

is the same as that of optimization problem (26), where

L_{f}

is Lipschitz constant of the function

f (β) : = {(2 n)}^{- 1} {∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥}_{2}^{2}

on the ball

{β \in R^{p} : ∥ β ∥_{2} \leq R}

, and R is a constant. Thus,

{\hat{β}}_{zn}

can be obtained by solving the following optimization problem with

ρ \geq \bar{ρ}

{\hat{β}}_{zn} \in arg min_{β \in R^{p}, w \in [0, e]} \{\frac{1}{2 n} {∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥}_{2}^{2} + \sum_{i = 1}^{p} λ [ϕ (w_{i}) + ρ (1 - w_{i}) | β_{i} |]\} .

(28)

Tao et al. [65] recommended using the multi-stage convex relaxation approach (GEP–MSCRA) to obtain

{\hat{β}}_{zn}

. This approach solves (28) in an iterative way with the main steps summarized as follows.

Step 1. Initialize the algorithm with

w^{(0)} \in [0, 2^{- 1} e]

ρ^{(0)} = 1

λ > 0

k = 1

Step 2. Solve the following optimization problem and get

{\hat{β}}_{zn}^{(k)}

{\hat{β}}_{zn}^{(k)} = arg min_{β \in R^{p}} \{\frac{1}{2 n} ∥ {\tilde{W}}_{F} β - {\tilde{y}}_{F} ∥_{2}^{2} + λ \sum_{i = 1}^{p} (1 - w_{i}^{(k - 1)}) | β_{i} |\} .

Step 3. If

k = 1

, choose an appropriate

ρ^{(1)} > ρ^{(0)}

using the information from

∥ {\hat{β}}_{zn}^{(1)} ∥_{\infty}

; if

1 < k \leq 3

, choose

ρ^{(k)}

satisfying

ρ^{(k)} > ρ^{(k - 1)}

; if

k > 3

, let

ρ^{(k)} = ρ^{(k - 1)}

Step 4. Obtain

w_{i}^{(k)} (i = 1, \dots, p)

through the following optimization problem

w_{i}^{(k)} = arg min_{0 \leq w_{i} \leq 1} \{ϕ (w_{i}) - ρ^{(k)} w_{i} | {\hat{β}}_{zn, i}^{(k)} |\} .

Step 5. Let

k \leftarrow k + 1

and repeat Steps 2–4 until the stopping conditions are satisfied.

Note that the initial

w^{(0)}

in Step 1 is an arbitrary vector from the interval

[0, 2^{- 1} e]

rather than the feasible set

[0, e]

in (28). The reason is to obtain a better initial estimator

{\hat{β}}_{zn}^{(1)}

. In addition,

w_{i}^{(k)}

in Step 4 has the following closed form based on the convexity of

ϕ

w_{i}^{(k)} = min [1, max (\frac{(a + 1) ρ^{(k)} | β_{i}^{(k)} | - 2}{2 (a + 1)}, 0)], i = 1, \dots, p .

Consequently, the primary calculation in each iteration is to solve a weighted

l_{1}

-norm regularized least square problem. Under some regularity conditions,

{\hat{β}}_{zn}^{(k)}

satisfies

∥ {\hat{β}}_{zn}^{(k)} - β_{0} ∥_{2} = O_{p} (λ \sqrt{s}) \forall k \in N^{+} .

(29)

It can be seen from (29) that the

l_{2}

-estimation error bound of CaZnRLS estimator possesses the same order as those of nonconvex Lasso and CoCoLasso estimators. Tao et al. [65] further showed that the error bound of

{\hat{β}}_{zn}^{(k + 1)}

is better than that of

{\hat{β}}_{zn}^{(k)}

for all

k \in N^{+}

. Furthermore, Tao et al. [65] demonstrated that GEP-MSCRA will produce a

{\hat{β}}_{zn}^{(k)}

such that

supp ({\hat{β}}_{zn}^{(k)}) = supp (β_{0})

in a finite number of iterations if the minimum nonzero value of smallest nonzero entries of

β_{0}

is not too small.

2.5. Linear and Conic Programming Estimation

In addition to the approaches mentioned above, another class of methods is based on the idea of Dantzig selector to acquire estimator of true regression coefficients

β_{0}

. Rosenbaum and Tsybakov [66] proposed the following matrix uncertainty (MU) selector

{\hat{β}}_{MU} = arg min_{β} \{{∥ β ∥}_{1} : ∥ n^{- 1} W^{T} {(y - W β) ∥}_{\infty} \leq δ {∥ β ∥}_{1} + λ\},

(30)

where

δ \geq 0

and

λ \geq 0

are tuning parameters depending on the level of measurement error U and model error

ε

, respectively.

However, the

n^{- 1} W^{T} W

is included in (30) rather than

n^{- 1} X^{T} X

due to the unobservability of

X

. Obviously, the matrix

n^{- 1} W^{T} W

contains bias caused by measurement errors. To address this issue, Rosenbaum and Tsybakov [67] proposed an improved version of MU selector called compensated MU selector. It is applicable to the case that the entries of measurement error

U_{i}

is independent such that

σ_{U, j}^{2} = n^{- 1} \sum_{i = 1}^{n} E (U_{i j}^{2})

is finite for

j = 1, \dots, p

. The compensated MU selector is defined as

{\hat{β}}_{CMU} = arg min_{β} \{{∥ β ∥}_{1} : ∥ n^{- 1} W^{T} (y - W β) + \hat{D} {β ∥}_{\infty} \leq δ {∥ β ∥}_{1} + λ\},

(31)

where

\hat{D}

is a diagonal matrix consisting of

{\hat{σ}}_{U, j}^{2}, j = 1, \dots, p

, and constants

δ

and

λ

are the same as those in (30). Rosenbaum and Tsybakov [67] showed that the

l_{q}

-estimation error of the estimator

{\hat{β}}_{CMU}

satisfies

∥ {\hat{β}}_{CMU} - β_{0} ∥_{q} = O_{p} (s^{1 / q} (∥ β_{0} ∥_{1} + 1) \sqrt{\frac{log p}{n}}), 1 \leq q \leq \infty .

MU selector and compensated MU selector provide two alternative estimation methods for high-dimensional measurement error models, but there remains an issue. The optimization problem in (31) may be nonconvex, and Rosenbaum and Tsybakov [67] did not offer a suitable algorithm to the general case. To remedy this issue, Belloni et al. [70] proposed the conic-programming based estimator

{\hat{β}}_{cp}

. Consider the following optimization problem

\begin{matrix} min_{β, t} \{{∥ β ∥}_{1} + κ t\}, \\ s . t . ∥ n^{- 1} W^{T} (y - W β) & + \hat{D} {β ∥}_{\infty} \leq δ t + λ, {∥ β ∥}_{2} \leq t, t \in R^{+}, \end{matrix}

(32)

where

κ, δ

and

λ

are positive tuning parameters. Suppose that the solution of (32) is

({\hat{β}}_{cp}, \hat{t})

, then

{\hat{β}}_{cp}

is defined as the conic-programming based estimator of true regression coefficients

β_{0}

. It is easy to verify that the optimization problem (32) can be solved efficiently in a polynomial time as it is a second-order cone programming problem. To analyze the asymptotic properties of

{\hat{β}}_{cp}

, assume that

κ \in [2^{- 1}, 2]

δ = O (\sqrt{log p / n})

, and

λ = O (\sqrt{log p / n})

. Then, Belloni et al. [70] showed that

l_{q}

-estimation

(1 \leq q \leq \infty)

and prediction error of

{\hat{β}}_{cp}

satisfy

\begin{matrix} ∥ {\hat{β}}_{cp} - β_{0} ∥_{q} & = O_{p} (s^{1 / q} (∥ β_{0} ∥_{2} + 1) \sqrt{\frac{log p}{n}}), 1 \leq q \leq \infty, \end{matrix}

(33)

\begin{matrix} n^{1 / 2} {∥ X ({\hat{β}}_{cp} - β_{0}) ∥}_{2} & = O_{p} (s^{1 / 2} (∥ β_{0} ∥_{2} + 1) \sqrt{\frac{log p}{n}}) . \end{matrix}

(34)

In contrast to nonconvex Lasso in [62], the conic-programming based estimator

{\hat{β}}_{cp}

can achieve the convergence rate in (33) and (34) without any information of the parameters

∥ β_{0} ∥_{1}

∥ β_{0} ∥_{2}

or s. Compared with compensated MU selector in [67], the conic-programming based estimator

{\hat{β}}_{cp}

can be obtained in the general case without the computational difficulty of nonconvexity.

3. Estimation Methods for Generalized Linear Models

The above methods are mainly for linear models. This section introduces the estimation methods for high-dimensional generalized linear models with measurement errors.

3.1. Estimation Method for Poisson Models

Count data is commonly encountered in various fields including finance, economics and social sciences. In order to analyze count data, Poisson regression models are a popular choice in practice. Jiang and Ma [74] studied the high-dimensional Poisson regression models with additive measurement errors and proposed a novel optimization algorithm to obtain the estimator of true regression coefficient vector

β_{0}

. Suppose that

Y_{i}

is the response variable following a Poisson distribution satisfying

E (Y_{i} | X_{i}) = exp (X_{i}^{T} β)

, where

X_{i} \in R^{p}

is an unobservable covariate. Its error-prone surrogate

W_{i} = X_{i} + U_{i}

, and the measurement error

U_{i}

follows from a sub-Gaussian distribution with known covariance matrix

Σ_{u}

. It is easy to verify that

E \{Y_{i} W_{i}^{T} β - exp (β^{T} W_{i} - β^{T} Σ_{u} β / 2) ∣ X_{i}, Y_{i}\} = Y_{i} X_{i}^{T} β - exp (β^{T} X_{i}) .

(35)

From (35), Jiang and Ma [74] imposed restriction on

β

similar to it in [62] and estimated

β

by solving the following optimization problem

{\hat{β}}_{p} = arg min_{{∥ β ∥}_{1} \leq c_{p} \sqrt{s}, {∥ β ∥}_{2} \leq c_{p}} \{L (β) + {λ ∥ β ∥}_{1}\},

(36)

where

L (β) = - \frac{1}{n} \sum_{i = 1}^{n} \{Y_{i} W_{i}^{T} β - exp (β^{T} W_{i} - β^{T} Σ_{u} β / 2)\} .

(37)

The estimator

{\hat{β}}_{p}

can be obtained by the composite gradient descent algorithm. Specifically, at

(k + 1)

th iteration, first solve the following optimization problem without any restrictions on

β

{\tilde{β}}_{p}^{(k + 1)} = arg min_{β} \{\partial L (β_{p}^{(k)}) / \partial β^{T} (β - β^{(k)}) + η / 2 ∥ β - β^{(k)} ∥_{2}^{2} + λ {∥ β ∥}_{1}\},

where

η > 0

is a stepsize parameter. Next, apply the projection method in [90] to project

{\tilde{β}}_{p}^{(k + 1)}

onto the

l_{1}

ball with radius

c_{p} \sqrt{s}

and produce

{\overset{˘}{β}}_{p}^{(k + 1)}

. If

∥ {\overset{˘}{β}}_{p}^{(k + 1)} ∥_{2} > c_{p}

, let

{\hat{β}}_{p}^{(k + 1)} = {\overset{˘}{β}}_{p}^{(k + 1)} c_{p} / {∥ {\overset{˘}{β}}_{p}^{(k + 1)} ∥}_{2}

, otherwise let

{\hat{β}}_{p}^{(k + 1)} = {\overset{˘}{β}}_{p}^{(k + 1)}

. Repeat the above steps until the stopping condition is satisfied. Jiang and Ma [74] proved the convergence of this algorithm. Under some regularity conditions, they further showed that the global minimum

{\hat{β}}_{p}

of (36) satisfies

∥ {\hat{β}}_{p} - β_{0} ∥_{q} = O (s^{1 / q} λ) .

(38)

There is an usual requirement that

λ ⩾ 2 {∥\partial L (β) / \partial β∥}_{\infty}

in Poisson models, and the term

{∥\partial L (β) / \partial β∥}_{\infty} = O (\sqrt{n / log p})

. Thus, the convergence rate of

{\hat{β}}_{p}

is slower than those of nonconvex Lasso, CoCoLasso and balanced estimators in linear models.

3.2. Generalized Matrix Uncertainty Selector

The method in [74] is only designed for high-dimensional Poisson models with measurement errors. To develop a method that is applicable to generalized linear models, Sørensen et al. [68] drew on the idea of MU selector and proposed the generalized matrix uncertainty (GMU) selector for high-dimensional generalized linear models with additive measurement errors.

Consider a generalized linear model with response variable Y distributed according to

f_{Y} (y; θ, ϕ) = exp \{\frac{y θ - b (θ)}{a (ϕ)} + c (y, ϕ)\},

where

θ = X^{T} β_{0}

X \in R^{p}

are the covariates. The expected response is given by the mean function

μ (θ) = b^{'} (θ)

, and Taylor expansion of the mean function

μ (X_{i}^{T} β_{0})

at point

W_{i}^{T} β_{0}

μ (X_{i}^{T} β_{0}) = \sum_{ℓ = 0}^{\infty} \frac{μ^{(ℓ)} (W_{i}^{T} β_{0})}{ℓ!} {(- U_{i}^{T} β_{0})}^{ℓ},

(39)

where

μ^{(ℓ)} (\cdot)

is the ℓth derivative of function

μ (\cdot)

. With Taylor expansion (39) of the mean function, the generalized matrix uncertainty selector can be defined as

\begin{matrix} {\hat{β}}_{GMU}^{L} = arg min_{β} \{{∥ β ∥}_{1} : β \in Θ^{L}\}, \\ Θ^{L} = \{β \in R^{p} : max_{1 \leq j \leq p} & |\frac{1}{n} w_{i j} [Y_{i} - μ (W_{i}^{T} β)]| \leq λ + \sum_{ℓ = 1}^{L} \frac{δ^{ℓ}}{ℓ! \sqrt{n}} {∥ β ∥}_{1}^{ℓ} {∥ μ^{(ℓ)} (W β) ∥}_{2}\}, \end{matrix}

(40)

where

δ

is the positive parameter satisfying

{∥ U ∥}_{\infty} \leq δ, μ^{(ℓ)} (W β) = {[μ^{(ℓ)} (W_{1}^{T} β), \dots, μ^{(ℓ)} (W_{n}^{T} β)]}^{T} .

In practice, Sørensen et al. [68] recommended using

L = 1

for computational convenience and demonstrated that the first-order approximation produces satisfactory results.

To solve the optimation problem (40) and obtain the estimator

{\hat{β}}_{GMU}^{L}

, we can utilize the iterative reweighing algorithm. The main iteration step of the algorithm is stated as follows

{\hat{β}}_{GMU}^{(k + 1)} = arg min_{β} \{{∥ β ∥}_{1} : \frac{1}{n} {∥{\tilde{W}}_{g}^{(k) T} ({\tilde{z}}^{(k)} - {\tilde{W}}_{g}^{(k)} β)∥}_{\infty} \leq λ + \sum_{ℓ = 1}^{L} \frac{δ^{ℓ}}{ℓ! \sqrt{n}} {∥ β ∥}_{1}^{ℓ} {∥ V^{(ℓ, k)} ∥}_{2}\},

(41)

where

{\tilde{W}}_{g} \in R^{n \times p}

is a matrix of weighted error-prone surrogate of covariates with elements

{\tilde{w}}_{g, i j}^{(k)} = w_{i j} \sqrt{V_{i}^{(1, k)}}

{\tilde{z}}^{(k)} \in R^{n}

is a vector with the elements

{\tilde{z}}_{i}^{(k)} = z_{i}^{(k)} \sqrt{V_{i}^{(1, k)}}

z_{i}^{(k)} = W_{i}^{T} {\hat{β}}_{GMU}^{(k)} + [Y_{i} - μ \{W_{i}^{T} {\hat{β}}_{GMU}^{(k)}\}] μ^{'} {\{W_{i}^{T} {\hat{β}}_{GMU}^{(k)}\}}^{- 1}, i = 1, \dots, n,

and

V^{(ℓ, k)} = {[μ^{(ℓ)} \{W_{1}^{T} {\hat{β}}_{GMU}^{(k)}\}, \dots, μ^{(ℓ)} \{W_{n}^{T} {\hat{β}}_{GMU}^{(k)}\}]}^{T} = {(V_{1}^{(ℓ, k)}, \dots, V_{n}^{(ℓ, k)})}^{T}, ℓ = 1, \dots, L

is the weight vector in Taylor expansion with L terms. When

L = 1

is applied, it is easy to verify that (41) is a linear program. For more details about the algorithm, please see [68,97]. However, Sørensen et al. [68] did not establish any asymptotic properties of GMU selector.

4. Hypothesis Testing Methods

The aforementioned works on high-dimensional measurement error models mainly investigate estimation problems and numerical algorithms of optimization problems, as well as the theoretical properties of estimators. Recently, some works have studied the hypothesis testing problems for high-dimensional measurement error regression models, which will be introduced in this section.

4.1. Corrected Decorrelated Score Test

The above methods are proposed under the setting that all covariates are corrupted. In practice, it is common that not all covariates are measured with errors. Thus, Li et al. [86] investigated high-dimensional measurement error models where a fixed number of covariates contain measurement errors and proposed statistical inference methods for the regression coefficients corresponding to these covariates.

Consider the following high-dimensional linear model with one of the covariates containing additive errors

\{\begin{matrix} y_{i} = β_{0} X_{i} + γ_{0}^{T} Z_{i} + ε_{i}, \\ W_{i} = X_{i} + U_{i}, i = 1, \dots, n, \end{matrix}

where

X_{i} \in R

is an unobservable covariate, and

W_{i}

is its error-prone surrogate,

Z_{i} \in R^{p - 1}

is an observed covariate precisely. The measurement error

U_{i}

follows from sub-Gaussian distribution with mean zero and variance

σ_{U}^{2}

, and

U_{i}

is independent of

(X_{i}, Z_{i}, ε_{i})

. Denote

y = {(y_{1}, \dots, y_{n})}^{T}, X = {(X_{1}, \dots, X_{n})}^{T}, W = {(W_{1}, \dots, W_{n})}^{T}

and

Z = {(Z_{1}, \dots, Z_{n})}^{T} .

This subsection aims to test the hypothesis:

H_{0} : β_{0} = β^{*} ⟷ H_{1} : β_{0} \neq β^{*} (β^{*} \in R),

and construct a confidence interval for

β_{0}

under high-dimensional settings.

Since that we are only concerned with the inference of the parameter

β

, then the parameter

γ

is regarded as a nuisance. Following the idea in [81], Li et al. [86] defined the corrected score function as

S_{θ} (θ) = \hat{Σ} θ - \hat{ρ} = \frac{1}{n} \sum_{i = 1}^{n} S_{i θ} (θ) = (\begin{matrix} S_{β} (β, γ) \\ S_{γ} (β, γ) \end{matrix}) = (\begin{matrix} {\hat{Σ}}_{11} β + {\hat{Σ}}_{12} γ - {\hat{ρ}}_{1} \\ {\hat{Σ}}_{21} β + {\hat{Σ}}_{22} γ - {\hat{ρ}}_{2} \end{matrix}),

where

θ = {(β, γ^{T})}^{T}

\hat{Σ} = (\begin{matrix} {\hat{Σ}}_{11} & {\hat{Σ}}_{12} \\ {\hat{Σ}}_{21} & {\hat{Σ}}_{22} \end{matrix}) = (\begin{matrix} W^{T} W / n - σ_{U}^{2} & W^{T} Z / n \\ Z^{T} W / n & Z^{T} Z / n \end{matrix}) and \hat{ρ} = (\begin{matrix} {\hat{ρ}}_{1} \\ {\hat{ρ}}_{2} \end{matrix}) = (\begin{matrix} W^{T} y / n \\ Z^{T} y / n \end{matrix})

are consistent estimators of

Σ = {(X, Z)}^{T} (X, Z) / n

and

ρ = {(X, Z)}^{T} y / n

, respectively. The corrected score covariance matrix is defined as

I (θ) = E \{S_{i θ} (θ) S_{i θ} {(θ)}^{T}\} = (\begin{matrix} I_{β β} & I_{β γ} \\ I_{γ β} & I_{γ γ} \end{matrix}) .

To conduct statistical inference on the target parameter

β

, it is crucial to eliminate the influence of nuisance parameter

γ

. Thus, Li et al. [86] developed the corrected decorrelated score function for target parameter

β

S (β, γ) = S_{β} (β, γ) - ω^{T} S_{γ} (β, γ),

where

ω^{T} = I_{β γ} I_{γ γ}^{- 1} = E (X_{i} Z_{i}^{T}) E {(Z_{i} Z_{i}^{T})}^{- 1}

. It easy to verify that

E [S (β_{0}, γ_{0}) S_{γ} (β_{0}, γ_{0})] = 0

, which indicates that

S (β, γ)

and nuisance score function

S_{γ} (β, γ)

are uncorrelated. Obviously, we can obtain that

Var [S (β, γ)] = I_{β β} - I_{β γ} I_{γ γ}^{- 1} I_{γ β} = : σ_{β ∣ γ}^{2}

. Then, Li et al. [86] constructed the test statistic and the confidence interval for

β_{0}

based on the estimated decorrelated score function. This statistical inference procedure is summarized as follows.

Step 1. Apply CoCoLasso estimation method in [63] to calculate initial estimator

\tilde{θ} = {(\tilde{β}, {\tilde{γ}}^{T})}^{T}

, and utilize following Dantzig type estimator to estimate

ω

\hat{ω} = arg min_{ω} {∥ ω ∥}_{1}, s . t . {∥ {\hat{Σ}}_{12} - ω^{T} {\hat{Σ}}_{22} ∥}_{\infty} \leq λ^{'},

where

λ^{'} = O (\sqrt{log p / n})

Step 2. Estimate the decorrelated score function by

\hat{S} (β, \tilde{γ}) = S_{β} (β, \tilde{γ}) - {\hat{ω}}^{T} S_{γ} (β, \tilde{γ}),

and calculate the test statistic

\hat{T} = \sqrt{n} \hat{S} (β^{*}, \tilde{γ}) {({\hat{σ}}_{β ∣ γ, H_{0}}^{2})}^{- 1 / 2}

, where

\begin{matrix} {\hat{σ}}_{β ∣ γ, H_{0}}^{2} & = {\{{\hat{I}}_{β β} - {\hat{ω}}^{T} {\hat{I}}_{γ β}\}|}_{β = β^{*}} \\ = ({\hat{σ}}_{ε, H_{0}}^{2} + β^{* 2} σ_{U}^{2}) (1 - {\hat{ω}}^{T} {\hat{Σ}}_{21}) + β^{* 2} E (U_{i}^{4}) + {\hat{σ}}_{ε, H_{0}}^{2} σ_{U}^{2} - β^{* 2} σ_{U}^{4} . \end{matrix}

Step 3. Estimate

β

\hat{β} = \tilde{β} - \hat{S} (\tilde{θ}) / ({\hat{Σ}}_{11} - {\hat{ω}}^{T} {\hat{Σ}}_{21}),

and construct

(1 - α) 100 %

confidence interval for

β_{0}

[\hat{β} - u_{1 - α / 2} \sqrt{{\hat{σ}}_{β}^{2} / n}, \hat{β} + u_{1 - α / 2} \sqrt{{\hat{σ}}_{β}^{2} / n}],

where

u_{1 - α / 2}

is the

(1 - α / 2)

quantile of standard normal distribution,

{\hat{σ}}_{β}^{2} = {(1 - {\hat{ω}}^{T} {\hat{Σ}}_{21})}^{- 2} \{({\hat{σ}}_{ε}^{2} + {\hat{β}}^{2} σ_{U}^{2}) (1 - {\hat{ω}}^{T} {\hat{Σ}}_{21}) + {\hat{β}}^{2} E (U_{i}^{4}) + {\hat{σ}}_{ε}^{2} σ_{U}^{2} - {\hat{β}}^{2} σ_{U}^{4}\}

is the estimator of the asymptotic variance

σ_{β}^{2}

\hat{β}

, and

{\hat{σ}}_{ε}^{2} = n^{- 1} \sum_{i = 1}^{n} {(y_{i} - \hat{β} W_{i} - {\tilde{γ}}^{T} Z_{i})}^{2} - {\hat{β}}^{2} σ_{U}^{2}

is the estimator of the variance

σ_{ε}^{2}

ε_{i}

Note that the methods used to estimate

θ

and

ω

in Step 1 can be varying, as long as the corresponding estimators are consistent, please see more discussions in [86]. Li et al. [86] showed that, under some regularity conditions,

\sqrt{n} \hat{S} (β^{*}, \tilde{γ}) {({\hat{σ}}_{β ∣ γ, H_{0}}^{2})}^{- 1 / 2} \overset{d}{\to} N (0, 1) as n \to \infty .

Further, the asymptotic normality of the test statistic

{\hat{T}}_{n}

at local alternatives was also established in [86] without any additional condition. Li et al. [86] also constructed the asymptotic confidence interval for target parameter

β

in Step 3 based on the asymptotic normality of

\hat{β}

, which is given as follows

\sqrt{n} (\hat{β} - β_{0}) = - {[E \{{\frac{\partial S (β, γ_{0})}{\partial β}|}_{β = β_{0}}\}]}^{- 1} \sqrt{n} S (β_{0}, γ_{0}) + o_{P} (1) \overset{d}{\to} N (0, σ_{β}^{2}) as n \to \infty,

where

σ_{β}^{2} = {\{E (X_{i}^{2}) - ω^{T} E (X_{i} Z_{i})\}}^{- 2} σ_{β ∣ γ, 0}^{2}

, and

σ_{β ∣ γ, 0}^{2} = (σ_{ε}^{2} + β_{0}^{2} σ_{U}^{2}) \{1 - ω^{T} E (X_{i} Z_{i})\} + β_{0}^{2} E (U_{i}^{4}) + σ_{ε}^{2} σ_{U}^{2} - β_{0}^{2} σ_{U}^{4} .

4.2. Wald and Score Tests for Poisson Models

In addition to linear models, researchers have made some progress on hypothesis testing problems for Poisson models. Jiang et al. [88] studied hypothesis testing problems for high-dimensional Poisson measurement error models, and they proposed Wald and score tests for the linear function of regression coefficients.

Consider the following hypothesis test

H_{0} : C β_{0 M} = b ⟷ H_{1} : C β_{0 M} = b + h_{n} for some h_{n} \in R^{r},

where

C \in R^{r \times m}

is a matrix with

r \leq m

β_{0 M} \in R^{m \times 1}

is a subvector of the true regression coefficient vector

β_{0} = {(β_{01}, \dots, β_{0 p})}^{T}

formed by

β_{0 j} (j \in M)

. To construct a valid test statistic, Jiang et al. [88] drew on the idea of estimation method in [74] and suggested estimating regression coefficients under the null hypothesis by

{\hat{β}}_{p n} = arg min_{{∥ β ∥}_{1} ⩽ R_{1}, {∥ β ∥}_{2} ⩽ R_{2}} \{L (β) + p_{λ} (β_{M^{c}})\}, s . t . C β_{M} = b,

(42)

where

p_{λ} (\cdot)

is a penalty function, and

L (β)

is defined in (37). Similarly, the following estimator of

β_{0}

can be considered without assuming the null hypothesis

{\hat{β}}_{p w} = arg min_{{∥ β ∥}_{1} ⩽ R_{1}, {∥ β ∥}_{2} ⩽ R_{2}} \{L (β) + p_{λ} (β_{M^{c}})\} .

(43)

The estimators

{\hat{β}}_{p n}

and

{\hat{β}}_{p w}

can be obtained by ADMM algorithm, please see more details in [88]. It can be seen that optimization problems (42) and (43) can be distinguished from the method in (36) because we do not impose penalties on the components of the target parameter

β_{M}

to avoid forcing them to be zeros. Then, based on the above estimators of

β_{0}

, Jiang et al. [88] proposed the following score statistic and Wald statistic to test whether

C β_{0 M} = b

or not

\begin{matrix} T_{S} = & n {\{\frac{\partial L (\hat{β})}{\partial β^{T}}\}}_{M \cup S} A^{T} Ψ^{- 1} ({\hat{Σ}}^{r}, \hat{Q}, \hat{β}) A {\{\frac{\partial L (\hat{β})}{\partial β}\}}_{M \cup S}, \\ T_{W} = & n {(C {\hat{β}}_{pw, M} - b)}^{T} Ψ {({\hat{Σ}}^{r}, \hat{Q}, {\hat{β}}_{pw})}^{- 1} (C {\hat{β}}_{pw, M} - b), \end{matrix}

where

A = C [I_{m \times m}, 0_{m \times k}] {\hat{Q}}_{M \cup S, M \cup S}^{- 1} (\hat{β})

Ψ (Σ, Q, β) \equiv C [I_{m \times m}, 0_{m \times k}] Q_{M \cup S, M \cup S}^{- 1} (β) Σ_{M \cup S, M \cup S} (β) Q_{M \cup S, M \cup S}^{- 1} (β) {[I_{m \times m}, 0_{m \times k}]}^{T} C^{T},

{\hat{Σ}}^{r} (β)

and

\hat{Q} (β)

are estimators of

Σ^{r} (β)

and

Q (β) = E \{exp (β^{T} X) X X^{T}\}

respectively, and

Σ^{r} (β) = E [{\{Y_{i} W_{i} - exp (β^{T} W_{i} - β^{T} Σ_{u} β / 2) (W_{i} - Σ_{u} β)\}}^{\otimes 2}]

is the covariance of the residuals.

Jiang et al. [88] established the consistency of

{\hat{β}}_{p n}

and

{\hat{β}}_{p w}

with

λ

larger than

O ({log p / n}^{1 / 4})

m = o ({log p / n}^{1 / 2})

and

s = o ({log p / n}^{1 / 2})

. Further, the asymptotic distributions of the two test statistics are established, specifically, as

n \to \infty

, we have

T_{S} \overset{d}{\to} χ^{2} (r, n h_{n}^{T} Ψ^{- 1} (Σ, Q, β_{t}) h_{n}), T_{W} \overset{d}{\to} χ^{2} (r, n h_{n}^{T} Ψ^{- 1} (Σ, Q, β_{t}) h_{n}) .

Thus, we reject the null hypothesis if

T_{S} > χ_{1 - α}^{2}

for score test with the nominal significance level

α > 0

, and reject the null hypothesis if

T_{W} > χ_{1 - α}^{2}

for Wald test, where

χ_{1 - α}^{2}

is the

(1 - α)

quantile of the chi-square distribution

χ^{2} (r)

5. Screening Methods

As the dimension of data becomes higher and higher, we often encounter ultrahigh-dimensional data. For the ultrahigh-dimensional models, we frequently reduce dimension using variable screening techniques and then apply other estimation or hypothesis testing methods. The variable screening technique SIS [49] designed for ultrahigh-dimensional clean data has achieved great success and has been extended to various settings. SIS screens the variables according to the magnitudes of their marginal correlations with the response variable. Nghiem et al. [89] drew inspiration from the ideas of SIS in [49] and marginal bridge estimation in [98], and proposed the corrected sure independence screening (SISc) method and corrected penalized marginal screening method (PMSc). Consider the following optimization problem

\begin{matrix} {\tilde{β}}_{sc} = & arg min_{β} L (β) = arg min_{β} \{\sum_{j = 1}^{p} L_{j} (β_{j})\} \\ = & arg min_{β} \{\frac{1}{n} \sum_{j = 1}^{p} [\sum_{i = 1}^{n} {(y_{i} - w_{i j} β_{j})}^{2} - σ_{j}^{2} β_{j}^{2} + p_{λ} (|β_{j}|)]\}, \end{matrix}

(44)

where

p_{λ} (\cdot)

is a penalty function, and the bridge penalty is adopted in [89]. Based on (44), Nghiem et al. [89] proposed PMSc and SISc methods. For PMSc method, it suggested taking the selected submodel as

{\hat{S}}_{PMSc} = \{j : {\tilde{β}}_{sc, j} \neq 0\} .

Under some regularity conditions, Nghiem et al. [89] showed that

P (S \subset {\hat{S}}_{PMSc}) \to 1

. Furthermore, when

λ = 0

, we can obtain that

{\tilde{β}}_{sc, j} = \frac{\sum_{i = 1}^{n} w_{i j} y_{i}}{\sum_{i = 1}^{n} w_{i j}^{2} - n σ_{u, j}^{2}}, j = 1, \dots, p,

which measures the marginal correlation between the jth variable and the response variable. The SISc selects the variable according to the magnitude of

{\tilde{β}}_{sc, j}

. The corresponding selected set is

{\hat{S}}_{SISc} = \{1 \leq j \leq p : | {\tilde{β}}_{j} | is among the d largest of all\} .

Nghiem et al. [89] proved that

P (S \subset {\hat{S}}_{SISc}) = 1 - O {p exp (- C n)}

for some constant

C > 0

under some regularity conditions.

6. Conclusions

With the advent of big data era, high-dimensional measurement error data have proliferated in various fields. Over the past few years, many statistical inference methods for high-dimensional measurement error regression models have been developed to overcome the difficulties in scientific research and provide effective approaches for tackling problems in applications. This paper reviews the research advances in estimation and hypothesis testing methods for high-dimensional measurement error models, as well as variable screening methods for ultrahigh-dimensional measurement error models. Due to the prevalence of high-dimensional measurement error data in daily life and the growing demand for the statistical inference methods of measurement error regression models in applications, the related research is still one of the crucial aspects in statistical research. At present, the statistical inference methods and the theoretical system of high-dimensional measurement error models are far from complete. Further research in this area includes the following aspects.

Existing estimation methods for high-dimensional measurement error regression models are mainly for linear or generalized linear models. Therefore, it is urgent to develop estimation methods for nonlinear models with high-dimensional measurement error data such as nonparametric and semiparametric models.
Existing works mainly focus on independent and identically distributed data. It is worthwhile to extend the estimation and hypothesis testing methods to measurement error models with complex data such as panel data and functional data.
In most studies of high-dimensional measurement error models, it is assumed that the covariance structure of the measurement errors is specific or that the covariance matrix of measurement errors is known. Thus, it is a challenging problem to develop estimation and hypothesis testing methods in the case that the covariance matrix of measurement errors is completely unknown.

Author Contributions

Conceptualization, G.L.; methodology, J.L.; validation, L.Y.; formal analysis, G.L.; investigation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, G.L. and L.Y.; supervision, G.L.; project administration, G.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (grant numbers: 12271046, 11971001, 12131006 and 12001277).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SIMEX	Simulation-extrapolation
SCAD	Smoothly clipped absolute deviation
SICA	Smooth integration of counting and absolute deviation
MCP	Minimax concave penalty
SIS	Sure independence screening
CoCoLasso	Convex conditioned Lasso
CaZnRLS	Calibrated zero-norm regularized least squares
MU	Matrix uncertainty
MEBoost	Measurement error boosting

SIMSELEX	Simulation-selection-extrapolation
IRO	Imputation-regularized optimization
FDR	False discovery rate
PMSc	Corrected penalized marginal screening
SISc	Corrected sure independence screening
ADMM	Alternating direction method of multipliers
BDCoCoLasso	Block coordinate descent convex conditioned Lasso
MPEC	Mathematical program with equilibrium constraints
GEP–MSCRA	Multi-stage convex relaxation approach
GMU	Generalized matrix uncertainty

References

Liang, H.; Härdle, W.; Carroll, R.J. Estimation in a semiparametric partially linear errors-in-variables model. The Annals of Statistics 1999, 27(5), 1519–1535. [Google Scholar] [CrossRef]
Cook, J.; Stefanski, L.A. Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association 1994, 89(428), 1314–1328. [Google Scholar] [CrossRef]
Carroll, R.J.; Lombard, F.; Kuchenhoff, H.; Stefanski, L.A. Asymptotics for the SIMEX estimator in structural measurement error models. Journal of the American Statistical Association 1996, 91(433), 242–250. [Google Scholar] [CrossRef]
Fan, J.Q.; Truong, Y.K. Nonparametric regression with errors in variables. The Annals of Statistics 1993, 21(4), 1900–1925. [Google Scholar] [CrossRef]
Cui, H.J.; Chen, S.X. Empirical likelihood confidence region for parameter in the errors-in-variables models. Journal of Multivariate Analysis 2003, 84(1), 101–115. [Google Scholar] [CrossRef]
Cui, H.J.; Kong, E.F. Empirical likelihood confidence region for parameters in semi-linear errors-in-variables models. Scandinavian Journal of statistics 2006, 33(1), 153–168. [Google Scholar] [CrossRef]
Cheng, C.L.; Tsai, J.R.; Schneeweiss, H. Polynomial regression with heteroscedastic measurement errors in both axes: estimation and hypothesis testing. Statistical Methods in Medical Research 2019, 28(9), 2681–2696. [Google Scholar] [CrossRef]
He, X.M.; Liang, H. Quantile regression estimates for a class of linear and partially linear errors-in-variables models. Statistica Sinica 2000, 10, 129–140. [Google Scholar]
Carroll, R.J.; Delaigle, A.; Hall, P. Nonparametric prediction in measurement error models. Journal of the American Statistical Association 2009, 104(487), 993–1003. [Google Scholar] [CrossRef]
Jeon, J.M.; Park, B.U.; Keilegom, I.V. Nonparametric regression on lie groups with measurement errors. The Annals of Statistics 2022, 50(5), 2973–3008. [Google Scholar] [CrossRef]
Chen, L.P.; Yi, G.Y. Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics 2020, 14(2), 4054–4109. [Google Scholar] [CrossRef]
Shi, P.X.; Zhou, Y.C.; Zhang, A.R. High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis. Biometrika 2022, 109(2), 405–420. [Google Scholar] [CrossRef]
Li, B.; Yin, X.R. On surrogate dimension reduction for measurement error regression: an invariance law. The Annals of Statistics 2007, 35(5), 2143–2172. [Google Scholar] [CrossRef]
Staudenmayer, J.; Buonaccorsi, J.P. Measurement error in linear autoregressive models. Journal of the American Statistical Association 2005, 100(471), 841–852. [Google Scholar] [CrossRef]
Wei, Y.; Carroll, R.J. Quantile regression with measurement error. Journal of the American Statistical Association 2009, 104(487), 1129–1143. [Google Scholar] [CrossRef] [PubMed]
Liang, H.; Li, R.Z. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association 2009, 104(485), 234–248. [Google Scholar] [CrossRef]
Hall, P.; Ma, Y.Y. Estimation in a semiparametric partially linear errors-in-variables model. The Annals of Statistics 2007, 35(6), 2620–2638. [Google Scholar]
Hall, P.; Ma, Y.Y. Semiparametric estimators of functional measurement error models with unknown error. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007, 69, 429–446. [Google Scholar] [CrossRef]
Ma, Y.Y.; Carroll, R.J. Locally efficient estimators for semiparametric models with measurement error. Journal of the American Statistical Association 2006, 101(476), 1465–1474. [Google Scholar] [CrossRef]
Ma, Y.Y.; Li, R.Z. Variable selection in measurement error models. Bernoulli 2010, 16(1), 274–300. [Google Scholar] [CrossRef]
Ma, Y.Y.; Hart, J.D.; Janicki, R.; Carroll, R.J. Local and omnibus goodness-of-fit tests in classical measurement error models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2011, 73, 81–98. [Google Scholar] [CrossRef] [PubMed]
Wang, L.Q. Estimation of nonlinear models with Berkson measurement errors. The Annals of Statistics 2004, 32(6), 2559–2579. [Google Scholar] [CrossRef]
Nghiem, L.H.; Byrd, M.C.; Potgieter, C.J. Estimation in linear errors-in-variables models with unknown error distribution. Biometrika 2020, 107(4), 841–856. [Google Scholar] [CrossRef]
Pan, W.Q.; Zeng, D.L.; Lin, X.H. Estimation in semiparametric transition measurement error models for longitudinal data. Biometrics 2009, 65(3), 728–736. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Zhou, Y. Calibration procedures for linear regression models with multiplicative distortion measurement errors. Brazilian Journal of Probability and Statistics 2020, 34(3), 519–536. [Google Scholar] [CrossRef]
Zhang, J. Estimation and variable selection for partial linear single-index distortion measurement errors models. Statistical Papers 2021, 62, 887–913. [Google Scholar] [CrossRef]
Wang, L.Q.; Hsiao, C. Method of moments estimation and identifiability of semiparametric nonlinear errors-in-variables models. Journal of Econometrics 2011, 165, 30–44. [Google Scholar] [CrossRef]
Schennach, S.M.; Hu, Y.Y. Nonparametric identification and semiparametric estimation of classical measurement error models without side information. Journal of the American Statistical Association 2013, 108(501), 177–186. [Google Scholar] [CrossRef]
Zhang, X.Y.; Ma, Y.Y.; Carroll, R.J. MALMEM: model averaging in linear measurement error models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2019, 81, 763–779. [Google Scholar] [CrossRef]
Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models, 2nd ed.; Chapman and Hall: New York, America, 2006. [Google Scholar]
Cheng, C.L.; Van Ness, J.W. Statistical Regression With Measurement Error; Oxford University Press: New York, America, 1999. [Google Scholar]
Fuller, W.A. Measurement Error Models; John Wiley & Sons: New York, America, 1987. [Google Scholar]
Li, G.R.; Zhang, J.; Feng, S.Y. Modern Measurement Error Models; Science Press: Beijing, China, 2016. [Google Scholar]
Yi, G.Y. Statistical Analysis with Measurement Error or Misclassification; Springer: New York, America, 2017. [Google Scholar]
Yi, G.Y.; Delaigle, A.; Gustafson, P. Handbook of Measurement Error Models; Chapman and Hall: New York, America, 2021. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 2001, 96(456), 1348–1360. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005, 67, 301–320. [Google Scholar] [CrossRef]
Zou, H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 2006, 101(476), 1418–1429. [Google Scholar] [CrossRef]
Candès, E.J.; Tao, T. The Dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics 2007, 35(6), 2313–2351. [Google Scholar]
Lv, J.C.; Fan, Y.Y. A unified approach to model selection and sparse recovery using regularized least squares. The Annals of Statistics 2009, 37(6A), 3498–3528. [Google Scholar] [CrossRef]
Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 2010, 38(2), 894–942. [Google Scholar] [CrossRef] [PubMed]
Fan, J.Q.; Lv, J.C. A selective overview of variable selection in high dimensional feature space. Statistica Sinica 2010, 20, 101–148. [Google Scholar]
Wu, Y.N.; Wang, L. A survey of tuning parameter selection for high-dimensional regression. Annual Review of Statistics and Its Application 2020, 7, 209–226. [Google Scholar] [CrossRef]
Kuchibhotla, A.K.; Kolassa, J.E.; Kuffner, T.A. Post-selection inference. Annual Review of Statistics and Its Application 2022, 9, 1–23. [Google Scholar] [CrossRef]
Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer-Verlag: Heidelberg, Germany, 2011. [Google Scholar]
Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; Taylor & Francis Group, CRC: Boca Raton, America, 2015. [Google Scholar]
Fan, J.Q.; Li, R.Z.; Zhang, C.-H.; Zou, H. Statistical Foundations of Data Science; Chapman and Hall: Boca Raton, America, 2020. [Google Scholar]
Fan, J.Q.; Lv, J.C. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008, 70, 849–911. [Google Scholar] [CrossRef]
Barut, E.; Fan, J.Q.; Verhasselt, A. Conditional sure independence screening. Journal of the American Statistical Association 2016, 111(515), 1266–1277. [Google Scholar] [CrossRef] [PubMed]
Fan, J.Q.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics 2010, 38(6), 3567–3604. [Google Scholar] [CrossRef]
Fan, J.Q.; Feng, Y.; Song, R. Nonparametric independence screening in sparse ultrahigh-dimensional additive models. Journal of the American Statistical Association 2011, 106(494), 544–557. [Google Scholar] [CrossRef] [PubMed]
Li, G.R.; Peng, H.; Zhang, J.; Zhu, L.X. Robust rank correlation based screening. The Annals of Statistics 2012, 40(3), 1846–1877. [Google Scholar] [CrossRef]
Ma, S.J.; Li, R.Z.; Tsai, C.L. Variable screening via quantile partial correlation. Journal of the American Statistical Association 2017, 112(518), 650–663. [Google Scholar] [CrossRef] [PubMed]
Pan, W.L.; Wang, X.Q.; Xiao, W.N.; Zhu, H.T. A generic sure independence screening procedure. Journal of the American Statistical Association 2019, 114(526), 928–937. [Google Scholar] [CrossRef]
Tong, Z.X.; Cai, Z.R.; Yang, S.S.; Li, R.Z. Model-free conditional feature screening with FDR control. Journal of the American Statistical Association 2022, in press. [Google Scholar] [CrossRef]
Wen, C.H.; Pan, W.L.; Huang, M.; Wang, X.Q. Sure independence screening adjusted for confounding covariates with ultrahigh dimensional data. Statistica Sinica 2018, 28(1), 293–317. [Google Scholar]
Wang, L.M.; Li, X.X.; Wang, X.Q.; Lai, P. Unified mean-variance feature screening for ultrahigh-dimensional regression. Computational Statistics 2022, 37, 1887–1918. [Google Scholar] [CrossRef]
Zhao, S.F.; Fu, G.F. Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation. Journal of Multivariate Analysis 2022, 192, article–105081. [Google Scholar] [CrossRef]
Slijepcevic, S. ; Megerian, S; Potkonjak, M. Location errors in wireless embedded sensor networks: sources, models, and effects on applications. Mobile Computing and Communications Review 2002. [Google Scholar]
Purdom, E.; Holmes, S.P. Error distribution for gene expression data. Statistical Applications in Genetics and Molecular Biology 2005, 4(1), 16–16. [Google Scholar] [CrossRef] [PubMed]
Loh, P.-L.; Wainwright, M.J. High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. The Annals of Statistics 2012, 40(3), 1637–1664. [Google Scholar] [CrossRef]
Datta, A.; Zou, H. CoCoLasso for high-dimensional error-in-variables regression. The Annals of Statistics 2017, 45(6), 2400–2426. [Google Scholar] [CrossRef]
Zheng, Z.M.; Li, Y.; Yu, C.X.; Li, G.R. Balanced estimation for high-dimensional measurement error models. Computational Statistics & Data Analysis 2018, 126, 78–91. [Google Scholar]
Tao, T.; Pan, S.H.; Bi, S.J. Calibrated zero-norm regularized LS estimator for high-dimensional error-in-variables regression. Statistica Sinica 2018, 31(2), 909–933. [Google Scholar] [CrossRef]
Rosenbaum, M.; Tsybakov, A. Sparse recovery under matrix uncertainty. The Annals of Statistics 2010, 38(5), 2620–2651. [Google Scholar] [CrossRef]
Rosenbaum, M.; Tsybakov, A. Improved matrix uncertainty selector. From Probability to Statistics and Back: High-Dimensional Models and Processes 2013, 9, 276–290. [Google Scholar]
Sørensen, Ø; Hellton, K.H.; Frigessi, A.; Thoresen, M. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 2018, 27, 739–749. [Google Scholar]
Sørensen, Ø; Frigessi, A.; Thoresen, M. Measurement error in Lasso: impact and likelihood bias correction. Statistics Sinica 2019, 25, 809–829. [Google Scholar]
Belloni, A.; Rosenbaum, M.; Tsybakov, A.B. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2017, 79, 939–956. [Google Scholar] [CrossRef]
Romeo, G.; Thoresen, M. Model selection in high-dimensional noisy data: a simulation study. Journal of Statistical Computation and Simulation 2019, 89(11), 2031–2050. [Google Scholar] [CrossRef]
Brown, B.; Weaver, T.; Wolfson, J. Meboost: variable selection in the presence of measurement error. Statistics in Medicine 2019, 38, 2705–2718. [Google Scholar] [CrossRef] [PubMed]
Nghiem, L.H.; Potgieter, C.J. Simulation-selection-extrapolation: estimation in high-dimensional errors-in-variables models. Biometrics 2019, 75, 1133–1144. [Google Scholar] [CrossRef]
Jiang, F.; Ma, Y.Y. Poisson regression with error corrupted high dimensional features. Statistica Sinica 2022, 32, 2023–2046. [Google Scholar] [CrossRef]
Byrd, M.; McGee, M. A simple correction procedure for high-dimensional generalized linear models with measurement error. arXiv preprint 2019, arXiv:1912.11740. [Google Scholar]
Liang, F.M.; Jia, B.C.; Xue, J.N.; Li, Q.Z.; Luo, Y. An imputation–regularized optimization algorithm for high dimensional missing data problems and beyond. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2018, 80, 899–926. [Google Scholar] [CrossRef]
van de Geer, S.; Bühlmann, P.; Ritov, Y.; Dezeure, R. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 2014, 42(3), 1166–1202. [Google Scholar] [CrossRef]
Zhang, C.-H.; Zhang, S.S. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2014, 76, 217–242. [Google Scholar] [CrossRef]
Ma, S.J.; Carroll, R.J.; Liang, H.; Xu, S.Z. Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates. The Annals of Statistics 2015, 43(5), 2102–2131. [Google Scholar] [CrossRef] [PubMed]
Dezeure, R.; Bühlmann, P.; Meier, L.; Meinshausen, N. High-dimensional inference: confidence intervals, p-values and R-software hdi. Statistical Science 2015, 30(4), 533–558. [Google Scholar] [CrossRef]
Ning, Y.; Liu, H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. The Annals of Statistics 2017, 45(1), 158–195. [Google Scholar] [CrossRef]
Zhang, X.Y.; Cheng, G. Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association 2017, 112(518), 757–768. [Google Scholar] [CrossRef]
Vandekar, S.N.; Reiss, P.T.; Shinohara, R.T. Interpretable high-dimensional inference via score projection with an application in neuroimaging. Journal of the American Statistical Association 2019, 114(526), 820–830. [Google Scholar] [CrossRef]
Ghosh, S.; Tan, Z.Q. Doubly robust semiparametric inference using regularized calibrated estimation with high-dimensional data. Bernoulli 2022, 28(3), 1675–1703. [Google Scholar] [CrossRef]
Belloni, A.; Chernozhukov, V.; Kaul, A. Confidence bands for coefficients in high dimensional linear models with error-in-variables. arXiv preprint 2017, arXiv:1703.00469. [Google Scholar]
Li, M.Y.; Li, R.Z.; Ma, Y.Y. Inference in high dimensional linear measurement error models. Journal of Multivariate Analysis 2021, 184, article–104759. [Google Scholar] [CrossRef]
Huang, X.D.; Bao, N.N.; Xu, K.; Wang, G.P. Variable selection in high-dimensional error-in-variables models via controlling the false discovery proportion. Communications in Mathematics and Statistics 2022, 10, 123–151. [Google Scholar] [CrossRef]
Jiang, F.; Zhou, Y.Q.; Liu, J.X.; Ma, Y.Y. On high dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization. The Annals of Statistics 2023, 51(1), 233–259. [Google Scholar] [CrossRef]
Nghiem, L.H.; Hui, F.K.C.; Müller, S.; Welsh, A.H. Screening methods for linear errors-in-variables models in high dimensions. Biometrics 2022, in press. [Google Scholar] [CrossRef]
Duchi, J.; Shalev-Shwartz, S.; Singer, Y.; Chandra, T. Efficient projections onto the l1-ball for learning in high dimensions. Proceedings of International Conference on Machine Learning, New York, America, July 2008. [Google Scholar]
Agarwal, A.; Negahban, S.; Wainwright, M.J. Fast global convergence of gradient methods for high-dimensional statistical recovery. The Annals of Statistics 2012, 40(5), 2452–2482. [Google Scholar] [CrossRef]
Chen, Y.D.; Caramanis, C. Noisy and missing data regression: distribution-oblivious support recovery. Journal of Machine Learning Research 2013, 28, 383–391. [Google Scholar]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 2011, 3(1), 1–122. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. The Annals of Statistics 2004, 32(2), 407–499. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 2010, 33(1), 1–22. [Google Scholar] [CrossRef] [PubMed]
Escribe, C.; Lu, T.Y.; Keller-Baruch, J.; Forgetta, V.; Xiao, B.W.; Richards, J.B.; Bhatnagar, S.; Oualkacha, K.; Greenwood, C.M.T. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 2021, 45, 874–890. [Google Scholar] [CrossRef]
James, G.M.; Radchenko, P. A generalized Dantzig selector with shrinkage tuning. Biometrika 2009, 96(2), 323–337. [Google Scholar] [CrossRef]
Huang, J.; Horowitz, J.L.; Ma, S.G. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. The Annals of Statistics 2008, 36(2), 587–613. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDPI Initiatives

Important Links

Choose an area of interest and we will send you notifications of new preprints at your preferred frequency.

Disclaimer

Overview of High-dimensional Measurement Error Regression Models

Abstract

1. Introduction

2. Estimation Methods for Linear Models

2.1. Nonconvex Lasso

2.2. Convex Conditioned Lasso

2.3. Balanced Estimation

2.4. Calibrated Zero-norm Regularized Least Square Estimation

2.5. Linear and Conic Programming Estimation

3. Estimation Methods for Generalized Linear Models

3.1. Estimation Method for Poisson Models

3.2. Generalized Matrix Uncertainty Selector

4. Hypothesis Testing Methods

4.1. Corrected Decorrelated Score Test

4.2. Wald and Score Tests for Poisson Models

5. Screening Methods

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

MDPI Initiatives

Important Links

Subscribe