Submitted:
31 January 2024
Posted:
31 January 2024
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Model description
2.1. Two-part latent variable model
2.2. Bayesian feature selection
3. Bayesian inference
3.1. Prior specification and MCMC sampling
3.2. MCMC sampling
- draw from ,
- draw from ,
- draw from ,
- draw from , and
- draw from .
4. Simulation study
5. China Household Finance Survey data
6. Discussion
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
| TPM | Two-part model |
| TPLVM | Two-part latent variable model |
| SS | Spike and slab bimodal prior |
| BaLsso | Bayesian lasso |
| MCMC | Markov Chains Monte Carlo |
| CHFS | China household finance survey |
Appendix A
References
- Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd Edn. John Wiley & Sons: New York.
- Bollen, K. A. (1989). Structural Equations with Latent Variables. John Wiley & Sons: New York.
- Brown, R. A., Monti, P. M., Myers, M. G., Martin, R. A., Rivinus, T., Dubreuil, M. E. T. and Rohsenow, D. J. (1998). Depression among cocaine abusers in treatment: Relation to cocaine and alcohol use and treatment outcome. American Journal of Psychiatry, 155, 220-225. [CrossRef]
- Chen, J. Y., Zheng, L, Y. and Xia, Y. M. (2023). Bayesian analysis for two-part latent variable model with application to fractional data. Communications in Statistics - Theory and Methods, published online. [CrossRef]
- Chhikara, R. S., and Folks, L. (1989). The Inverse Gaussian Distribution: Theory, Methodology, and Applications, New York: Marcel Dekker.
- Chipman, H. A. (1996). Bayesian variable selection with related predictors. Canad. J. Statist., 24, 17-36. [CrossRef]
- Cragg, J. G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39, 829-844. [CrossRef]
- Deb, P, Munkin, M. K., Trivedic, R. K. (2006). Bayesian analysis of the two-part model with endogeneity: Application to health care expenditure. J. Appl. Econ., 21, 1081-1099. [CrossRef]
- Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer-Verlag: New York.
- Duan, N., Manning, W. G., Morris, C. N. and Newhouse, J. P. (1983). A Comparison of alternative models for the demand for medical Care. Journal of Business and Economic Statistics, 1, 115-126. [CrossRef]
- Fan, J., and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. [CrossRef]
- Feng, X., Lu, B., Song, X. and Ma, S. (2019). Financial literacy and household finances: A Bayesian two-part latent variable modeling approach. Journal of Empirical Finance, 51, 119-137. [CrossRef]
- Feng, X., Wang, Y. F., Lu, B., and Song, X. Y. (2017). Bayesian regularized quantile structural equation models. Journal of Multivariate Analysis, 154, 234-248. [CrossRef]
- Fu, W. J. (1998). Penalized regression: the bridge versus the lasso. Journal of computational and Graphical Statistics, 7, 109-148. [CrossRef]
- George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88, 881-889. [CrossRef]
- George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesianvariable selection. Stat. Sin., 7, 339-373.
- Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-409. [CrossRef]
- Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–511. [CrossRef]
- Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6), 721–741. [CrossRef]
- Gou, J. W., Xia, Y. M. and Jiang, D. P. (2023). Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method. Statistical Moddelling, 23 (4), 721–741. [CrossRef]
- Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements ofStatistical Learning. Springer-Verlag, New York, NY.
- Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity - The Lasso and Generalization. CRC Press: New York.
- Ishwaran, H., and Rao, J. S.(2005). Spike and Slab gene selcetion for multigroup microarray data, Journal of the American Statistical Association, 87, 371-390. [CrossRef]
- Ishwaran, H., and Rao, J. S. (2005). Spike and Slab variable selection: frequentist and Bayesian strageies. The Annals of Statistics, 33, 730-773. [CrossRef]
- Kim, Y., and Muthén, B. O. (2009). Two-Part Factor Mixture Modeling: Application to an Aggressive Behavior Measurement Instrument. Structural Equation Modeling: A Multidisciplinary Journal, 16, 602-624. [CrossRef]
- Kuo, L. and Mallick, B. K. (1998). Variable selection for regression models. Sankhya, Ser. B, 60, 65-81.
- Lee, S. Y. (2007). Structural Equation Modeling: A Bayesian Approach. John Wiley & Sons: New York.
- Little, R. J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, second Edition. John Wiley & Sons: New York.
- Liu, L., Cowen, M. E., Strawderman, R. L. and Shih, Y. C. T. (2010). A flexible two-part random effects model for correlated medical costs. Journal of Health Economics, 29, 110-123. [CrossRef]
- Manning, W. G., et al. (1981). A two-part model of the demand for medical Care: preliminary results from the health insurance experiment, in Health, Economics, and Health Economics, eds. van der Gaag, J. and Perlman, M., p. 103-104, Amsterdam: North-Holland.
- Mitchell, T.J. and Beauchamp, J.J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023-1032. [CrossRef]
- Neelon, B., Zhu, L. and Neelon, S. E. B. (2015). Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics, 16, 465-479. [CrossRef]
- Olsen, M. K. and Schafer, J. L. (2001). A two-part random-effects model for semicontinuous longitudinal data, Journal of the American Statistical Association, 96, 730-745. [CrossRef]
- Park, T. and Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103(482), 681-686. [CrossRef]
- Polson, N. G., Scott, J. G., Windle, J. (2013). Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108, 1339-1349. [CrossRef]
- Ross, S. M. (1991). A Course in Simulation. MacMillan: New York.
- Rockova, V. and George, E. I. (2014). EMVS: The EM approach toBayesian variable selection. Journal of the American Statistical Association, 109, 828-846. [CrossRef]
- Sha, N. J. and Dechi, B. O. (2019). A Bayes inference for ordinal response with latent variable approach. Stats, 2, 321-331. [CrossRef]
- Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized latent variable modelling: multilevel, longitudinal and structural equation models. Chapman & Hall/CRC: London.
- Smith, V. A., Neelon, B., Preisser, J. S., Maciejewski, L. (2015). A marginalized two-part model for semicontinuous data. Statistics in Medicine, 33, 4891-4903. [CrossRef]
- Song, X. Y and Lee, S. Y. (2012). A tutorial on the Bayesian approach for analyzing structural equation models. Journal of Mathematical Psychology, 56(3), 135-148. [CrossRef]
- Song, X. Y, Xia, Y. M. and Zhu H. T. (2017). Hidden Markov latent variable models with multivariate longitudinal data. Biometrics, 73, 313-323. [CrossRef]
- Su, L., Tom, B. D. and Farewell, V. T. (2009). Bias in 2-part mixed models for longitudinal semi-continuous data. Biostatistics, 10, 374-389. [CrossRef]
- Su, L., Tom, B. D. and Farewell, V. T. (2015). A likelihood-based two-part marginal model for longitudinal semi-continuous data. Statiscal Methods in Medical Research, 24, 194-205. [CrossRef]
- Tang, Z. X., Shen, Y.P., Xinyan Zhang, X. Y. andnd Nengjun Yi, N. J. (2017) The Spike-and-Slab Lasso Generalized Linear Modelsfor Prediction and Associated Genes Detection. Genetics, 205, 77-88. [CrossRef]
- Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation(with discussion). Journal of the American statistical Association, 82, 528-550. [CrossRef]
- Tibshirani, R. (1996). Regression shrinkage and selection via theLasso. J. R. Stat. Soc. B, 58, 267-288. [CrossRef]
- Tooze, J. A., Grunwald, J. K. and Jones, R. H. (2002). Analysis of repeated measures data with clumping at zero. Statistical Methods in Medical Research, 11, 341-355. [CrossRef]
- Xia, Y. M. and Tang, N. S. (2019). Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data. Computational Statistics & Data Analysis, 132, 190-211. [CrossRef]
- Xing, D. Y., Huang, Y. X., Chen H. N., Zhu, Y. L., Dagen, G. A. and Baldwin, J. (2017). Bayesian inference for two-part mixed effects model using skew distributions, with application to longitudinal semi-continuous alcohol data, Statistical Methods in Medical Research, 26, 1838-1853. [CrossRef]
- Xiong, S. C., Xia, Y. M., Lu, B. (2023). Bayesian Analysis of Two-Part Latent Variable Model with Mixed Data. Communications in Mathematics and Statistics, in press. [CrossRef]
- Zhang, W., T. Ota, V. Shridhar, J. Chien, B. Wu et al., (2013). Networkbased survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLOS Comput. Biol., 9, e1002975. [CrossRef]
- Zhao, Q., X. Shi, Y. Xie, J. Huang, B. Shia et al., (2014). Combiningmultidimensional genomic measurements for predicting cancerprognosis: observations from TCGA. Brief. Bioinform, 16, 291-303. [CrossRef]
- Zou, H., and Hastie, T. (2005). Regularization and variable selectionvia the elastic net. J. R. Stat. Soc. B, 67, 301-320. [CrossRef]
- Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American statistical Association, 101, 1418-1429. [CrossRef]




| SS | BaLsso | ||||||
|---|---|---|---|---|---|---|---|
| PAR | BIAS | RMS | SD | BIAS | RMS | SD | |
| -0.015 | 0.097 | 0.129 | 0.028 | 0.150 | 0.134 | ||
| -0.056 | 0.143 | 0.142 | -0.152 | 0.217 | 0.136 | ||
| -0.001 | 0.021 | 0.061 | -0.019 | 0.042 | 0.079 | ||
| -0.144 | 0.216 | 0.145 | -0.122 | 0.251 | 0.148 | ||
| 0.005 | 0.030 | 0.064 | -0.008 | 0.040 | 0.078 | ||
| -0.091 | 0.147 | 0.137 | -0.045 | 0.135 | 0.137 | ||
| 0.017 | 0.028 | 0.075 | 0.026 | 0.055 | 0.096 | ||
| -0.187 | 0.237 | 0.184 | -0.126 | 0.209 | 0.184 | ||
| 0.010 | 0.079 | 0.084 | 0.008 | 0.063 | 0.085 | ||
| -0.035 | 0.079 | 0.077 | -0.011 | 0.065 | 0.074 | ||
| 0.005 | 0.032 | 0.051 | -0.018 | 0.031 | 0.054 | ||
| -0.007 | 0.061 | 0.070 | -0.021 | 0.085 | 0.069 | ||
| -0.007 | 0.029 | 0.049 | -0.003 | 0.031 | 0.053 | ||
| -0.070 | 0.093 | 0.077 | -0.018 | 0.082 | 0.075 | ||
| -0.040 | 0.086 | 0.089 | -0.02 | 0.069 | 0.088 | ||
| -0.011 | 0.033 | 0.062 | 0.014 | 0.036 | 0.069 | ||
| 0.085 | 0.129 | 0.117 | 0.038 | 0.082 | 0.111 | ||
| 0.042 | 0.078 | 0.073 | 0.058 | 0.098 | 0.071 | ||
| 0.030 | 0.072 | 0.071 | 0.034 | 0.063 | 0.072 | ||
| 0.058 | 0.079 | 0.072 | 0.052 | 0.090 | 0.073 | ||
| 0.031 | 0.060 | 0.072 | 0.037 | 0.064 | 0.073 | ||
| 0.014 | 0.041 | 0.074 | 0.018 | 0.058 | 0.076 | ||
| Total | - | 1.870 | 1.975 | - | 2.016 | 2.035 | |
| SS | BaLsso | ||||||
|---|---|---|---|---|---|---|---|
| PAR | BIAS | RMS | SD | BIAS | RMS | SD | |
| 0.052 | 0.096 | 0.087 | 0.009 | 0.092 | 0.087 | ||
| 0.005 | 0.069 | 0.089 | 0.055 | 0.117 | 0.090 | ||
| 0.003 | 0.048 | 0.058 | 0.032 | 0.052 | 0.060 | ||
| 0.007 | 0.086 | 0.093 | -0.045 | 0.076 | 0.091 | ||
| 0.004 | 0.015 | 0.049 | -0.020 | 0.043 | 0.060 | ||
| 0.010 | 0.071 | 0.086 | 0.013 | 0.074 | 0.085 | ||
| -0.003 | 0.029 | 0.059 | 0.032 | 0.064 | 0.077 | ||
| 0.002 | 0.102 | 0.120 | -0.042 | 0.108 | 0.114 | ||
| 0.017 | 0.042 | 0.053 | 0.030 | 0.056 | 0.054 | ||
| -0.023 | 0.038 | 0.046 | -0.016 | 0.039 | 0.047 | ||
| -0.007 | 0.019 | 0.033 | -0.005 | 0.018 | 0.037 | ||
| -0.028 | 0.060 | 0.042 | -0.014 | 0.026 | 0.043 | ||
| -0.007 | 0.023 | 0.033 | 0.000 | 0.018 | 0.036 | ||
| -0.005 | 0.035 | 0.046 | 0.003 | 0.043 | 0.047 | ||
| -0.031 | 0.058 | 0.053 | -0.039 | 0.063 | 0.054 | ||
| -0.001 | 0.031 | 0.045 | -0.025 | 0.081 | 0.053 | ||
| 0.018 | 0.049 | 0.068 | 0.041 | 0.053 | 0.071 | ||
| 0.021 | 0.041 | 0.045 | 0.033 | 0.038 | 0.045 | ||
| 0.016 | 0.049 | 0.045 | 0.028 | 0.038 | 0.045 | ||
| 0.032 | 0.049 | 0.045 | 0.054 | 0.057 | 0.045 | ||
| 0.043 | 0.059 | 0.046 | 0.043 | 0.054 | 0.046 | ||
| 0.016 | 0.043 | 0.049 | 0.005 | 0.037 | 0.048 | ||
| Total | - | 1.112 | 1.29 | - | 1.247 | 1.335 | |
| SS | BaLsso | ||||||
|---|---|---|---|---|---|---|---|
| PAR | |||||||
| 100 | 100 | 100 | 100 | 100 | 100 | ||
| 98 | 96 | 85 | 88 | 86 | 76 | ||
| 100 | 100 | 100 | 100 | 100 | 100 | ||
| 96 | 95 | 86 | 93 | 93 | 85 | ||
| 100 | 100 | 100 | 100 | 100 | 100 | ||
| 96 | 94 | 93 | 97 | 92 | 87 | ||
| 100 | 100 | 100 | 100 | 100 | 100 | ||
| 99 | 100 | 100 | 100 | 100 | 100 | ||
| 100 | 99 | 95 | 100 | 98 | 93 | ||
| 100 | 100 | 100 | 100 | 100 | 100 | ||
| 100 | 100 | 97 | 98 | 100 | 91 | ||
| 100 | 100 | 100 | 100 | 100 | 100 | ||
| 100 | 100 | 100 | 100 | 100 | 100 | ||
| 100 | 98 | 97 | 97 | 96 | 96 | ||
| Variable. | Description. | Mean. | Max. | Min. | SD |
|---|---|---|---|---|---|
| Gender () | =1, male; =0, otherwise | 0.756 | 1 | 0 | 0.430 |
| Age () | 51.81 | 91 | 19 | 14.931 | |
| Marital status () | =1, married; 0, otherwise | 0.863 | 1 | 0 | 0.344 |
| Health condition ( | =1, good; 0, otherwise | 0.833 | 1 | 0 | 0.373 |
| Education degree ( | =1, high school or above; | ||||
| =0, otherwise | 0.352 | 1 | 0 | 0.478 | |
| Employment () | =1, yes; 0, otherwise | 0.092 | 1 | 0 | 0.290 |
| No. of adults () | 3.002 | 3 | 0 | 1.301 | |
| Annual Income (CYN) | 0 |
| SS | BaLsso | SS | BaLsso | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Par | Est. | SD | Est. | SD | Par | Est. | SD | Est. | SD | |||
| -0.835 | 0.078 | -0.838 | 0.080 | 9.782 | 0.152 | 9.670 | 0.125 | |||||
| 0.050 | 0.063 | 0.076 | 0.070 | -0.137 | 0.103 | -0.107 | 0.088 | |||||
| -0.750 | 0.099 | -0.757 | 0.102 | -0.147 | 0.141 | -0.015 | 0.081 | |||||
| 0.107 | 0.085 | 0.147 | 0.088 | -0.022 | 0.065 | -0.006 | 0.075 | |||||
| 0.428 | 0.062 | 0.072 | 0.070 | -0.019 | 0.060 | -0.029 | 0.069 | |||||
| 0.577 | 0.070 | 0.082 | 0.081 | 0.259 | 0.123 | 0.322 | 0.107 | |||||
| 0.004 | 0.040 | 0.005 | 0.052 | 0.035 | 0.058 | 0.053 | 0.067 | |||||
| 0.118 | 0.079 | 0.130 | 0.079 | 0.043 | 0.072 | 0.281 | 0.113 | |||||
| 0.747 | 0.073 | 0.092 | 0.077 | 0.384 | 0.132 | 0.188 | 0.118 | |||||
| -0.059 | 0.112 | -0.039 | 0.092 | 1.205 | 0.106 | 1.910 | 0.104 | |||||
| 0.312 | 0.150 | 0.300 | 0.152 | |||||||||
| -0.791 | 0.062 | -0.714 | 0.057 | |||||||||
| -0.865 | 0.067 | -0.625 | 0.068 | |||||||||
| Part one | Part two | |||||
|---|---|---|---|---|---|---|
| VAR | SS | BaLsso | SS | BaLsso | ||
| Gender | 0 | 0 | 1 | 1 | ||
| Age | 1 | 1 | 1 | 0 | ||
| Material status | 1 | 1 | 0 | 0 | ||
| Health condition | 1 | 0 | 0 | 0 | ||
| Education | 1 | 0 | 1 | 1 | ||
| Employment | 0 | 0 | 0 | 0 | ||
| No. of Adults | 1 | 1 | 0 | 1 | ||
| Income | 1 | 0 | 1 | 1 | ||
| Family culture | 0 | 0 | 1 | 1 | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).