2.1. Quantile Regression
Consider a dataset of
n independent subjects, For the
subject, let
be the response and
is a
predictor vector, a simple linear regression model is defined as follows:
where
is the regression coefficient vector with
corresponding to the intercept terms, and
represents error term with unknown distribution. It is usual to assume that the
th quantile of the random error term is 0, that is,
for
. According to this assumption, the form of
quantile regression of model (
1) is specified as follows:
where
is the inverse cumulative distribution function of
given
evaluated at
. The estimate of the regression coefficient vector
in equation (
2) is
where the loss function
with the indicator function
.
In light of [
24] and [
5], minimizing equation (
3) is equivalent to maximizing the likelihood of
n independent individuals with the
one distributed as asymmetric Laplace distribution (ALD) specified as
where the local parameter
, the scale parameter
and the skewness parameter
is between 0 and 1, obviously ALD is Laplace distribution when
and
. However, it is computationally infeasible to carry out statistical inference based directly on equation (
4) involving non-differentiable point
. Following [
25], equation (
4) can be rewritten as the following hierarchical fashion:
where
,
and
denotes the exponential distribution with mean
, whose specific density function is
. Equation (
5) illustrates that an asymmetric Laplace distribution can also be represented as a mixture of exponential and standard normal distributions, which allows us to express a quantile regression model as a normal regression model in which the response has following conditional distribution
For the above-defined quantile regression model with high-dimensional covariate vector (r is enough large), it is of major interest to estimate parameter vector and to identify the important covariates. To this end, we consider Bayesian Quantile Regression based on Spike-and-Slab Lasso as follows.
2.2. Bayesian Quantile Regression based on Spike-and-Slab Lasso
As early as 2016, Xi et al. [
23] applied Spike-and-Slab prior to Bayesian quantile regression, but their proposed prior was a mixture of zero particle and normal distribution with large variance, and the estimate of posterior density was obtained through Gibbs sampler. In order to provide new theoretical insights for a class of continuous spike-and-slab priors, Rockova (2018) [
26] introduced the novel family of spike-and-slab priors, which is a mixture of two density functions respectively with spike or slab probability. In this paper, we adopt spike-and-slab Lasso prior with a mixture of two Laplace distributions respectively with large or small variance [
26], which facilitate the variational Bayesian technique to approximate the posterior density of parameters and to improve the efficiency of the algorithm. In light of ref [
26], given the indicator
or 1, the prior of
in Bayesian quantile regression model (
5) can be written as:
where Laplace density
and
with precision parameters
and
satisfying that
, the indicator variable set
, the
variable is active when
, and inactive otherwise. Similar to [
27], Laplace distribution for the regression coefficient
can be represented as a mixture of a normal distribution and an exponential distribution, specifically the distribution of
can be expressed as a hierarchical structure as follows:
where
denotes Bernoulli distribution with
being the probability that indicator variable
equals one for
, and specify the prior of
as Beta distribution
with hyperparameters
and
,
and
are regularization parameters to identify important variables, for which we consider the following conjugate priors:
where
denotes the gamma distribution with shape parameter
a and scale parameter
b. As mentioned above,
and
should satisfy that
, to this end, we select hyperparmeters
and
to satisfy that
. The prior of scale parameter
in (
5) is inverse gamma distribution
the hyperparmeters
and
in the paper leading to almost non-informative prior.
Under Bayesian statistical paradise, based on the above priors and likelihood of quantile regression, it is direct to induce the following posterior distribution
, where
, the latent variable set
,
,
, observing set
with the response set
and covariate set
. Based on the hierarchical structure (
5) of the quantile regression likelihood
and the hierarchical structure (
7) of the Spike-and-Slab prior for regression coefficient vector
, we derive the joint density
Although it is easy to sample from the above presented posterior can, it is very time-consuming for a sufficiently large r. To address the issue, we investigate a fast yet efficient approach as follows, i.e., variational Bayesian method.
2.3. Quantile Regression with Spike-and-Slab Lasso Penalty Based on Variational Bayesian
At present, the most commonly variational Bayesian approximation posterior distribution methods is mean field approximation theory [
28] which is of the highest efficiency among variational methods, especially for those parameters or parameter block with conjugate priors. In view of the fact that Bayesian quantile regression needs to take into account that the variance of each observation value is different, and each
corresponds to a potential variable
, which will result in the algorithm efficiency of quantile regression being lower than that of general mean regression. Therefore, in this paper, we use the variational Bayesian algorithm of the mean field approximation, which is the most efficient algorithm, to derive the quantile regression model with Spike-and-Slab Lasso penalty.
According to the principle of variational inference, we require specifying a variational family
of densities for random variables
having the same support
as the posterior density
. Let
be any variational density approximating the posterior density
. The main purpose of variational Bayes is to find the best approximation to
via the Kullback-Leibler divergence between
and
, which is equivalent to solving the following optimization problem:
where
, which is not less than zero and equal to zero if and only if
. The posterior density
with the joint distribution
of parameter
and data
and the marginal distribution
data
. Since
, which has not analytic expression for our considered model, it is rather difficult to implement the optimization problem presented above. It is easy to induce that
in which the evidence lower bound (ELOB)
with
representing the expectation taken with respect to the variational density
. Thus, minimizing
is equivalent to maximizing
because log
does not depend on
. That is,
which implies that finding the best approximation problem for
is transformed into maximizing
over the variational family
. The complexity of the optimization procedure heavily depends on that of the variational family
. Hence, it is quite desirable to specify a relatively simple variational family
to optimize the objective function
with respect to
over
.
Following the commonly used approach to specify a simple variational family
in the variational literature, we consider the well-known mean-field variational family in which blocks of
are mutually independent and each is dominated by a distinct factor in the variational density. That is, the variational density
is assumed to be factorized across the components of
:
where the individual variational densities
’s are unspecified, but the above assumed factorization across components is pre-given. Moreover, the optimal solutions of
’s are to be determined by maximizing
with respect to densities
via the coordinate ascent method, where
where
can be either a scalar or a vector. This means that when the correlation between several unknown parameters or potential variables cannot be ignored, they should be put in the same block and merged into
. Following the idea of the coordinate ascent method given in ref [
29], when fixing other variational factors
for
, i.e., the optimal density
, that maximizes
with respect to
, is shown to take the form
where
is the logarithm of the joint density function and
is the expectation taken with respect to the density
for
.
According to equations (
8)-(
9), we can derive the variational posterior for each parameter as follows (see
Appendix A for the details) :
where
denotes generalized inverse Gaussian distribution,
and
In the above equation, , and , , , are similar to , with , and is a vector with the component of vector deleted, with .
In the section above, we have derived the variational posterior of each parameter. Using the idea of coordinate axis optimization, we can update each variational distribution iteratively until it converges.
For this reason, we list the variational Bayesian Spike-and-Slab Lasso quantile regression (VBSSLQR) algorithm as shown in the following Algorithm 1:
In Algorithm 1 above, the
is the digamma fuction and the expectation
of
with respect to generalized inverse Gaussian distribution. So, we assume that
, then:
where
represents Bessel function of the second kind. Note that there is no analytic solution or function to the differential of the Modified Bessel function. Therefore, We approcimate
by the second-order Taylor expansion of
. This paper lists the expectation of some parameter functions about variational posteriors involved in algorithm
Section 2.3. See
Appendix B for details. Based on our proposed VBSSLQR algorithm, in the next section, we randomly generate high-dimensional data, conducting simulation research and comparing the performance of other methods.