Bayesian Joint Modeling Analysis of Longitudinal Proportion and Survival Data

Wenting Liu; Huiqiong Li; Anming Tang; Zixin Cui

doi:10.20944/preprints202307.0492.v1

Submitted:

06 July 2023

Posted:

10 July 2023

You are already at the latest version

Abstract

This paper focuses on a joint model to analyse longitudinal proportion and survival data. After transforming the longitudinal proportional data by a logit function, we adopt a partially linear mixed-effect model for it, in which nonlinear function of time is fitted via using B-splines technique and a centered Dirichlet Process Mixture Model (CDPMM) is specified for a general distribution of random effects. The survival data is assumed to a Cox proportional hazard model, the sharing random effects joint model is developed for the two types of data. Combining the Gibbs sampler and the Metropolis-Hastings algorithm, we propose a Bayesian Lasso (BLasso) method to simultaneously estimate unknown parameters and select important covariates. Simulation studies are conducted to investigate the finite sample performance of the proposed methods. An example from the MA.5 research experiment is used to illustrate the proposed methodologies.

Keywords:

longitudinal proportional data

;

survival data

;

joint model

;

Bayesian variable selection

;

B-splines

;

CDPMM metnod

Subject:

Computer Science and Mathematics - Probability and Statistics

1. Introduction

Joint analysis of longitudinal and survival data have been widely applied to cancer and HIV/AIDS clinical studies where time to event outcomes, such as disease free and overall survivals, are usually the primary endpoints. In particular, after Faucett and Thomas [1] and Wulfsohn and Tsiatis [2] introduced what is the standard joint model, the joint model has been extensively explored. The advantages of joint models have been discussed by researchers [3,4,5,6,7,8]. However, in clinical trials, some patients with poor quality of life may drop out of the study due to disease recurrence or death. In this case, the absence of quality of life measures resulting from the withdrawal of these patients is informative. In order to provide strong evidence for the tradeoff between intensive treatment and the associated poor quality of life for patients, we conducted joint modeling of longitudinal life measures and survival data to investigate the relationship between them. For longitudinal quality of life and survival data, Henderson et al. [9] and Zeng and Cai [10] considered the use of shared normal distribution random effects to jointly analyze the relationship between longitudinal quality of life and survival time. Tang et al. [11] considered a novel semiparametric joint model for multivariate longitudinal and survival data to analyze data from the International Breast Cancer Study. Longitudinal life quality measurement data can be linearly converted into longitudinal proportion data whose value range is in the unit interval (0,1) [12]. For the longitudinal component, there are two methods to deal with it. The first method applied the classic linear mixed model to the longitudinal scale data after logit transformation, and the second method directly used the simplex distribution to model the longitudinal scale data. The models established by the two methods both used the EM algorithm and the Laplace approximation to estimate the unknown parameters. In order to be more flexible and practical, this paper will use a partial linear mixed effect model to the logit transformed longitudinal scale data, and use the B-splines method to model the nonlinear function in the model. Meanwhile, to promote the feasibility of our proposed model, we use CDPMM method to model random effects.

In addition, variables selection in the joint model is also considered. In traditional regression models, methods of variable selection include the forward selection method, backward elimination method, stepwise selection method or some information criterion such as the Akaike information criterion. But these approaches are computationally expensive and unstable for the complicated models with a large number of covariates. Therefore, some penalized likelihood methods are gradually proposed, especially Lasso of Tibshirani [13] estimating linear regression coefficients by

L_{1}

-constrained least squares. Tibshirani [13] proposed that Lasso estimates can be interpreted as posterior norm estimates when the regression parameters have independent and identical Laplacian priors (ie, double exponential priors). Motivated by this connection, under the Bayesian framework, Park and Casella [14] proposed the BLasso variable selection method by imposing the double exponential prior on the regression coefficients and the gamma distribution on the shrinkage parameter. The BLasso method has been applied to linear regression model [15], semi-parametric structural equation model [16] and joint models of longitudinal and survival data [17]. Therefore, this paper extended that BLasso variable selection method to the joint model of longitudinal proportion data and survival data and a BLasso approach is developed to simultaneously estimate unknown parameters and identify the significant effect of the important covariates.

The rest of this paper is organized as follows. In Section 2, the joint model of longitudinal proportion and survival data is introduced. In Section 3, the Bayesian estimations of joint model are proposed. In Section 4, three numerical simulations are presented to evaluate the performance of the proposed methods. In Section 5, we apply the proposed approach to the MA.5 research experiment data and some concluding remarks are given in Section 6. Technical details are given in Appendix A.

2. Model and Notation

Consider a data set from n individuals. Let

y_{i j}

be a longitudinal proportional measurement for the i-th individual (

i = 1, 2, \dots, n

) at observation time point

t_{i j}

for

j = 1, 2, \dots, n_{i}

, and

y_{i j} \in (0, 1)

. We assume

y_{i j}^{*}

is the logit-transformation of

y_{i j}

, and

y_{i j}^{*} = logit (y_{i j}) = log (\frac{1 - y_{i j}}{y_{i j}})

. Further,

T_{i}^{*}

and

C_{i}

are the true survival time and censoring time, respectively. Let

T_{i} = min (T_{i}^{*}, C_{i})

denotes the corresponding observed event time, and

δ_{i} = 1 (T_{i}^{*} \leq C_{i})

denotes the failure indicator, where

1 (\cdot)

is a indicator function.

Denote

y^{*} = \{y_{1}^{*}, y_{2}^{*}, \dots, y_{n}^{*}\}

,

y_{i}^{*} = \{y_{i 1}^{*}, y_{i 2}^{*}, \dots, y_{i n_{i}}^{*}\}

,

T = \{T_{1}, T_{2}, \dots, T_{n}\}

,

Δ = \{δ_{1}, δ_{2}, \dots, δ_{n}\}

. Let

b = {b_{1}, b_{2}, \dots, b_{n}}

be time-independent random effects underlying both the longitudinal and survival processes for the i-th individual. Given the random effects

b_{i}

, assume that

y_{i j}^{*}

follows partially linear mixed-effect model

\begin{matrix} y_{i j}^{*} | b_{i} = X_{i j}^{⊤} β + g (t_{i j}) + Z_{i j}^{⊤} b_{i} + ε_{i j}, \end{matrix}

(1)

where

X_{i j}

and

Z_{i j}

represent the time-independent design vectors of fixed and random effects associated with

y_{i j}^{*}

, respectively. And

β

is a

p_{1} \times 1

fixed effects regression parameters,

b_{i}

is a

q \times 1

random effects vector,

g (t)

is a twice continuous differentiable unknown function, and

ε_{i j}

is a white noise process with variance

σ^{2}

. Also, we assume that

ε_{i j}

’s are independent of

b_{i}

. To facilitate the feasibility of our proposed model, instead of the traditional normality assumption, which may be violated in some applications [18], it is specified the random effects by Dirichlet process (DP) mixture of normals.

For event time

T_{i}

, given random effects

b_{i}

, we assume that

T_{i}

follows the hazard model

\begin{matrix} λ_{i} (t | b_{i}) = λ_{0} (t) exp (W_{i}^{⊤} γ + ϕ^{⊤} b_{i}), \end{matrix}

(2)

where

W_{i}

is the known fixed effects design matrix linking the unknown

p_{2} \times 1

parameter vector

γ

to

λ_{i} (t | b_{i})

,

ϕ

is an unknown

q \times 1

parameter vector linking

b_{i}

to

λ_{i} (t | b_{i})

, and

λ_{0} (t)

is an unknown basic hazard function.

From the above discussion, it is suggested to link models (1) and (2) through shared random effects, called a shared random effects joint model (JMSRE). Parameter

ϕ

in model JMSRE reflects the correlation between transformed longitudinal proportional data and survival data, given random effects. When

ϕ = 0_{q \times 1}

, it means that the longitudinal index is not necessarily related to the event time, that is, longitudinal proportional data and survival data can be modeled separately. So in this case joint modeling is not necessary and longitudinal indicators can be ignored for modeling survival data.

Further, to make Bayesian inference on

β

based on model (1), we approximate

g (t)

through a B-splines method

\begin{matrix} g (t) \approx B_{1} (t) φ_{1} + B_{2} (t) φ_{2} + \dots + B_{L} (t) φ_{L} = B^{⊤} (t) φ, \end{matrix}

where

L = d + K + 1

, d is the degree of B-splines, K is the number of knots,

φ = {(φ_{1}, φ_{2}, \dots, φ_{L})}^{⊤}

is a

L \times 1

unknown coefficient vector, and

B (t) = {(B_{1} (t), B_{2} (t), \dots, B_{L} (t))}^{⊤}

.

Denote

θ_{y} = \{β, φ, σ^{2}\}

be unknown parameters associated with model (1), and

θ_{T} = \{γ, ϕ\}

be unknown parameters associated with model (2). Thus, given

(θ_{y}, θ_{T}, b)

, a joint likelihood function of

(y^{*}, T, Δ)

can be written as

p (y^{*}, T, Δ | θ_{y}, θ_{T}, b) = \prod_{i = 1}^{n} p (y_{i}^{*} | b_{i}; θ_{y}) p (T, Δ | b_{i}; θ_{T}),

(3)

where

p (y_{i}^{*} | b_{i}; θ_{y}) = \prod_{j = 1}^{n_{i}} \frac{1}{\sqrt{2 π σ^{2}}} exp \{- \frac{{(y_{i j}^{*} - X_{i j}^{⊤} β - B^{⊤} (t_{i j}) φ - Z_{i j}^{⊤} b_{i})}^{2}}{2 σ^{2}}\},

p (T, Δ | b_{i}; θ_{T}) = \prod_{i = 1}^{n} {[\frac{exp (W_{i}^{⊤} γ + ϕ^{⊤} b_{i})}{\sum_{j \in R_{i}} exp (W_{j}^{⊤} γ + ϕ^{⊤} b_{j})}]}^{δ_{i}}, R_{i} = \{j : T_{j} \geq T_{i}\} .

3. Bayesian estimation of joint model

3.1. Prior specification

To develop Bayesian inference on the considered models, we need specifying the prior distributions for

σ^{2}, φ, β, γ and ϕ

. For conjugation, we consider the following priors for

σ^{2}, φ

:

σ^{- 2} \sim Γ (a_{0}, b_{0}), φ \sim N_{L} (0, H_{φ}^{0}),

(4)

where

a_{0}, b_{0}

, and

H_{φ}^{0}

are pre-given hyperparameters,

Γ (a_{0}, b_{0})

denotes the Gamma distribution with parameter

a_{0}

and the shape parameter

b_{0}

.

Following Tang et al. [17], we specify the DP mixture of normals for the random effects

b_{i}

, that is,

b_{i} \overset{i . i . d .}{\sim} \sum_{g = 1}^{\infty} π_{g} N_{q} (μ_{g}, Ω_{g})

with

(μ_{g}, Ω_{g}) \sim P

, where

P

is a random probability with an unknown form. one common approach is to specify DP prior to approximate

P

, i.e.,

P \sim DP (τ F_{0})

in which

F_{0}

is a base distribution that serves as a starting-point for constructing the nonparametric distribution, and

τ

is a weight that indicates the researcher’s certainty of

F_{0}

as the distribution of

P

. To gain the form of

F

, motivated by Ohlssen et al. [18] and Yang et al. [19], we consider the following truncated approximate centered Dirichlet Process Mixture Model (CDPMM), which can specify the prior distribution of

b_{i}

as follows:

b_{i} \overset{i . i . d .}{\sim} \sum_{g = 1}^{G} π_{g} N_{q} (μ_{g}, Ω_{g}) with μ_{g} = μ_{g}^{*} - \sum_{g = 1}^{G} π_{g} μ_{g}^{*} and (μ_{g}^{*}, Ω_{g}) \overset{i . i . d .}{\sim} F_{0},

where

1 \leq G < \infty

, and

π_{g}

is a random probability weight chosen to be independent of

(μ_{g}^{*}, Ω_{g})

such that

0 \leq π_{g} \leq 1

and

\sum_{g = 1}^{\infty} π_{g} = 1

. In order to ease the computational intensity, according to take moderate value proposed by Ishwaran and Zarepour (2000) [20], we consider

G = 25

, and

π_{g}

is given by the following stick-breaking procedure:

π_{1} = ϑ_{1} and π_{g} = ϑ_{g} \prod_{ι = 1}^{g - 1} (1 - ϑ_{ι}) for g = 2, \dots, G,

(5)

where

ϑ_{g} \overset{i . i . d .}{\sim} Beta (1, τ)

for

g = 1, 2, \dots, G - 1

, and

ϑ_{G} = 1

so that

\sum_{g = 1}^{G} π_{g} = 1

. The prior distribution for the unknown parameter

τ

is given by

τ \sim Γ (a_{1}, a_{2})

with prespecified hyperparameters

a_{1}

and

a_{2}

[21]. Here, we took the hyperparameters

a_{1}

and

a_{2}

to be 25 and 5, respectively.

Sampling from posterior distributions of

b_{i}

with the above specified DP prior is quite difficult and inefficient. An efficient and flexible method for solving the above question is to represent

b_{i}

in terms of a latent variable

L_{i} \in {1, 2, \dots, G}

, which records each

b_{i}

’s cluster membership and conveys its parametric value to the distribution of

b_{i}

. Let

L = {L_{1}, L_{n}, \dots, L_{n}}

,

π = {π_{1}, π_{2}, \dots, π_{G}}

,

μ^{*} = {μ_{1}^{*}, μ_{2}^{*}, \dots, μ_{G}^{*}}

and

Ω = {Ω_{1}, Ω_{2}, \dots, Ω_{G}}

in which

Ω_{g} = diag (ω_{g 1}, ω_{g 2}, \dots, ω_{g q})

, and they can be reformulated by

L_{i} | π \overset{i . i . d}{\sim} \sum_{g = 1}^{G} π_{g} δ_{g} (\cdot) and (π, μ^{*}, Ω) \sim f_{1} (π) f_{2} (μ^{*}) f_{3} (Ω),

where

δ_{g} (\cdot)

denotes a discrete probability measure concentrated at g,

f_{1} (π)

is specified by the stick-breaking prior as given in Equation (5),

f_{2} (μ^{*}) = \prod_{g = 1}^{G} f_{2} (μ_{g}^{*})

and

f_{3} (Ω) = \prod_{g = 1}^{G} \prod_{j = 1}^{q} f_{3} (ω_{g_{j}})

. Here,

μ_{g}^{*}

and

ω_{g_{j}}

can be specified by

μ_{g}^{*} | ξ, Ψ \overset{i . i . d}{\sim} N_{q} (ξ, Ψ), ξ | ξ^{0}, Ψ^{0} \sim N_{q} (ξ^{0}, Ψ^{0}), ψ_{j}^{- 1} | c_{1}, c_{2} \sim Γ (c_{1}, c_{2}) for j = 1, 2, \dots, q,

(6)

ω_{g_{j}}^{- 1} | ω_{j}^{a}, ϖ_{j} \sim Γ (ω_{j}^{a}, ϖ_{j}) and ϖ_{j} | ϖ_{j}^{a}, ϖ_{j}^{b} \sim Γ (ϖ_{j}^{a}, ϖ_{j}^{b}),

respectively, where

Ψ = diag (ψ_{1}, ψ_{2}, \dots, ψ_{q})

,

Γ (c_{1}, c_{2})

denotes the Gamma distribution with parameters

c_{1}

and

c_{2}

, and

ξ^{0}, Ψ^{0}, c_{1}, c_{2}, ω_{j}^{a}, ϖ_{j}^{a}

and

ϖ_{j}^{b}

are prespecified hyperparameters [18]. We consider

ξ^{0} = 0_{q \times 1}

,

Ψ^{0} = I_{q}

,

c_{1} = 11

,

c_{2} = 2.5

,

ω_{j}^{a} = 3

,

ϖ_{j}^{a} = n

and

ϖ_{j}^{b} = 10

in the paper. Given the values of

L_{i}

,

μ^{*}

and

Ω

, we can sample

b_{i}

from

N_{q} (μ_{L_{i}}, Ω_{L_{i}})

with

μ_{L_{i}} = μ_{L_{i}}^{*} - Σ_{g = 1}^{G} μ_{g}^{*}

.

Hereafter we mainly introduce the variable selection principle of BLasso method [14,16] for the proposed joint model JMSRE. We need to identify not only the important variables in models (1) and (2). In particular, we also need to identify whether the parameter

ϕ

is

0_{q \times 1}

, and our proposed BLasso method does just that. In general, the prior distribution of the regression parameters is set to a multivariate normal distribution. Following the idea of Bayesian Lasso inference [14], we consider the following hierarchical priors for

β

,

γ

and

ϕ

:

\begin{matrix} β | H_{β} \sim N_{p_{1}} (0, H_{β}), & with H_{β} = d i a g (h_{β_{1}}^{2}, h_{β_{2}}^{2}, \dots, h_{β_{p_{1}}}^{2}), \\ f (h_{β_{1}}^{2}, h_{β_{2}}^{2}, \dots, & h_{β_{p_{1}}}^{2}) = \prod_{j = 1}^{p_{1}} \frac{ϑ_{β_{j}}^{2}}{2} exp (- \frac{ϑ_{β_{j}}^{2}}{2} h_{β_{j}}^{2}), \end{matrix}

(7)

\begin{matrix} γ | H_{γ} \sim N_{p_{2}} (0, H_{γ}), & with H_{γ} = d i a g (h_{γ_{1}}^{2}, h_{γ_{2}}^{2}, \dots, h_{γ_{p_{2}}}^{2}), \\ f (h_{γ_{1}}^{2}, h_{γ_{2}}^{2}, \dots, & h_{γ_{p_{2}}}^{2}) = \prod_{j = 1}^{p_{2}} \frac{ϑ_{γ_{j}}^{2}}{2} exp (- \frac{ϑ_{γ_{j}}^{2}}{2} h_{γ_{j}}^{2}), \end{matrix}

(8)

\begin{matrix} ϕ | H_{ϕ} \sim N_{q} (0, H_{ϕ}), & with H_{ϕ} = d i a g (h_{ϕ_{1}}^{2}, h_{ϕ_{2}}^{2}, \dots, h_{ϕ_{q}}^{2}), \\ f (h_{ϕ_{1}}^{2}, h_{ϕ_{2}}^{2}, \dots, & h_{ϕ_{q}}^{2}) = \prod_{j = 1}^{q} \frac{ϑ_{ϕ_{j}}^{2}}{2} exp (- \frac{ϑ_{ϕ_{j}}^{2}}{2} h_{ϕ_{j}}^{2}), \end{matrix}

(9)

where

ϑ_{β} = \{ϑ_{β_{1}}, ϑ_{β_{2}}, \dots, ϑ_{β_{p_{1}}}\}

,

ϑ_{γ} = \{ϑ_{γ_{1}}, ϑ_{γ_{2}}, \dots, ϑ_{γ_{p_{2}}}\}

and

ϑ_{ϕ} = \{ϑ_{ϕ_{1}}, ϑ_{ϕ_{2}}, \dots, ϑ_{ϕ_{q}}\}

are the regularization parameters that control the tail decay. In particular, in order to better control the effect of tail decay, this paper sets different regularization parameters for different components of the same parameter. Inspired by Park and Casella [14], we further consider the following super-priorities for these tuning parameters:

\begin{matrix} ϑ_{β_{j}}^{2} & \sim Γ (a_{ϑ_{β}}, b_{ϑ_{β}}), j = 1, 2, \dots, p_{1}, \end{matrix}

(10)

\begin{matrix} ϑ_{γ_{j}}^{2} & \sim Γ (a_{ϑ_{γ}}, b_{ϑ_{γ}}), j = 1, 2, \dots, p_{2}, \end{matrix}

(11)

\begin{matrix} ϑ_{ϕ_{j}}^{2} & \sim Γ (a_{ϑ_{ϕ}}, b_{ϑ_{ϕ}}), j = 1, 2, \dots, q . \end{matrix}

(12)

3.2. Bayesian analysis of joint model

To obtain Bayesian estimates of unknown parameters

β, φ, σ^{2}, b, γ

and

ϕ

, a hybrid algorithm combining the block Gibbs sampler and the Metropolis-Hastings algorithm is employed to iteratively draw

{β, φ, σ^{2}, b, γ, ϕ}

as follows.

(A) Conditional distribution of $β$

It follows from Equation (3) and Equation (7) that the conditional posterior distribution

p (β | φ, σ^{2}, b, y^{*})

is expressed as

\begin{matrix} p (β | φ, σ^{2}, b, y^{*}) \propto exp \{- \frac{1}{2} [\sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} \frac{1}{σ^{2}} (y_{i j}^{*} - X_{i j}^{⊤} β - B^{⊤} (t_{i j}) φ - Z_{i j}^{⊤} b_{i}) + β^{⊤} H_{β}^{- 1} β]\}, \end{matrix}

which yields

\begin{matrix} β | φ, σ^{2}, b, y^{*} \sim N_{p_{1}} (A_{β}, V_{β}), \end{matrix}

where

V_{β} = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} σ^{- 2} X_{i j}^{⊤} X_{i j} + Σ_{β}^{- 1}

,

A_{β} = V_{β} (\sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} X_{i j} (y_{i j}^{*} - Z_{i j}^{⊤} b_{i} - B^{⊤} (t_{i j}) φ) σ^{- 2})

.

(B) Conditional distribution of $φ$ According to Equation (3) and the prior of

φ

in Equation (4), the conditional distribution

p (φ | β, σ^{2}, b, y^{*})

is given by

\begin{matrix} p (φ | β, σ^{2}, b, y^{*}) \propto exp \{- \frac{1}{2} [\sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} \frac{1}{σ^{2}} (y_{i j}^{*} - X_{i j}^{⊤} β - B^{⊤} (t_{i j}) φ - Z_{i j}^{⊤} b_{i}) + φ^{⊤} {(H_{φ}^{0})}^{- 1} φ]\}, \end{matrix}

which yields

\begin{matrix} φ | β, σ^{2}, b, y^{*} \sim N_{L} (A_{φ}, V φ), \end{matrix}

where

V φ = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} σ^{- 2} B {(t_{i j})}^{⊤} B (t_{i j}) + Σ_{φ}^{- 1}

,

A φ = V φ (\sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} B (t_{i j}) (y_{i j}^{*} - X_{i j}^{⊤} β - Z_{i j}^{⊤} b_{i}) σ^{- 2})

.

(C) Conditional distribution of $σ^{- 2}$

It follows from Equation (3) and the prior of

σ^{2}

in Equation (4) that the conditional distribution

p (σ^{- 2} | β, φ, b, y^{*})

is given by

\begin{matrix} p (σ^{- 2} | β, φ, b, y^{*}) & \propto exp \{- (\frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} (y_{i j}^{*} - X_{i j}^{⊤} β - B^{⊤} (t_{i j}) φ - Z_{i j}^{⊤} b_{i}) + b_{0}) σ^{- 2}\} \\ \times {(σ^{- 2})}^{\frac{1}{2} \sum_{i = 1}^{n} n_{i} + a_{0} - 1}, \end{matrix}

which yields

\begin{matrix} σ^{- 2} | β, φ, b, y^{*} \sim Γ (\frac{1}{2} \sum_{i = 1}^{n} n_{i} + a_{0}, \frac{1}{2} (y_{i j}^{*} - X_{i j}^{⊤} β - B^{⊤} (t_{i j}) φ - Z_{i j}^{⊤} b_{i}) + b_{0}) . \end{matrix}

(D) Conditional distribution of $b_{i}$

For reasons of space, the sampling of

b_{i}, i = 1, 2, \dots, n

follows the steps in Appendix A, which can also be seen in Tang et al. [24]

(E) Conditional distribution of $γ$

It follows from Equation (3) and Equation (8) that the conditional distribution

p (γ | b, ϕ, T, Δ)

is proportional to

\begin{matrix} \prod_{i = 1}^{n} {[\frac{exp (W_{i}^{⊤} γ + ϕ^{⊤} b_{i})}{\sum_{j \in R_{i}} exp (W_{j}^{⊤} γ + ϕ^{⊤} b_{j})}]}^{δ_{i}} e x p \{- \frac{1}{2} γ^{T} H_{γ}^{- 1} γ\}, \end{matrix}

which is not a familiar distribution. Therefore, the well-known Metropolis-Hastings (MH) algorithm is adopted to simulate observations from the above given conditional distribution, which is implemented as follows. Given the current value

γ^{(m)}

, new candidates

γ

are generated from

N_{p_{2}} (γ^{(m)}, σ_{γ}^{2} Σ_{γ})

, where

Σ_{γ} = {(- \partial^{2} {(ln (p (γ | b, ϕ, T, Δ)) / \partial γ \partial γ^{⊤} |}_{γ = γ^{(m)}})}^{- 1}

. The new

γ^{(m)}

is accepted with probability

\begin{matrix} m i n \{1, \frac{p (γ | b, ϕ, T, Δ)}{p (γ^{(m)} | b, ϕ, T, Δ)}\}, \end{matrix}

where

\begin{matrix} Σ_{γ} = {(\sum_{i = 1}^{n} \frac{\sum_{j \in R_{i}} exp (\cdot) W_{j} W_{j}^{⊤} \sum_{j \in R_{i}} exp (\cdot) + \sum_{j \in R_{i}} exp (\cdot) W_{j} \sum_{j \in R_{i}} exp (\cdot) W_{j}^{⊤}}{{(\sum_{j \in R_{i}} exp (\cdot))}^{2}} + H_{γ}^{- 1})}^{- 1} . \end{matrix}

with

exp (\cdot) = exp (W_{j}^{⊤} γ^{(m)} + ϕ^{⊤} b_{j}) .

The variance coefficient

σ_{γ}^{2}

can be chosen such that the average acceptance rate are approximately 0.25 or more.

(F) Conditional distribution of $ϕ$

From Equation (3) and Equation (9), the conditional distribution

p (ϕ | γ, b, T, Δ)

is proportional to

\begin{matrix} \prod_{i = 1}^{n} {[\frac{exp (W_{i}^{⊤} γ + ϕ^{⊤} b_{i})}{\sum_{j \in R_{i}} exp (W_{j}^{⊤} γ + ϕ^{⊤} b_{j})}]}^{δ_{i}} e x p \{- \frac{1}{2} γ^{T} H_{ϕ}^{- 1} γ\}, \end{matrix}

which is not a familiar distribution. Similar to above (E), given the current value

ϕ^{(m)}

, new candidates

ϕ

are generated from

N_{q} (ϕ^{(m)}, σ_{ϕ}^{2} Σ_{ϕ})

, where

Σ_{ϕ} = {(- \partial^{2} {(ln (p (ϕ | b, γ, T, Δ)) / \partial ϕ \partial ϕ^{⊤} |}_{ϕ = ϕ^{(m)}})}^{- 1}

. The new

ϕ^{(m)}

is accepted with probability

\begin{matrix} m i n \{1, \frac{p (ϕ | γ, b, T, Δ)}{p (ϕ^{(m)} | γ, b, T, Δ)}\}, \end{matrix}

where

\begin{matrix} Σ_{ϕ} = {(\sum_{i = 1}^{n} \frac{\sum_{j \in R_{i}} exp (\cdot) b_{j} b_{j}^{⊤} \sum_{j \in R_{i}} exp (\cdot) + \sum_{j \in R_{i}} exp (\cdot) b_{j} \sum_{j \in R_{i}} exp (\cdot) b_{j}^{⊤}}{{(\sum_{j \in R_{i}} exp (\cdot))}^{2}} + H_{ϕ}^{- 1})}^{- 1}, \end{matrix}

and

exp (\cdot) = exp (W_{j}^{⊤} γ + {ϕ^{(m)}}^{⊤} b_{j}) .

The variance coefficient

σ_{ϕ}^{2}

can be chosen such that the average acceptance rate are approximately 0.25 or more.

By the above iterative process, we can obtain a series of sample

{(β^{(m)}, φ^{(m)}, {σ^{2}}^{(m)}, b_{i}^{(m)}, γ^{(m)}, ϕ^{(m)}) : m = 1, 2, \dots, M}

. Then, Bayesian estimates of

β

,

φ

,

σ^{2}

,

b_{i}

,

γ

and

ϕ

can be obtained by

\begin{matrix} \hat{β} = \frac{1}{M} \sum_{m = 1}^{M} β^{(m)}, \hat{φ} = \frac{1}{M} \sum_{m = 1}^{M} φ^{(m)}, \hat{σ^{2}} = \frac{1}{M} \sum_{m = 1}^{M} {σ^{2}}^{(m)}, \end{matrix}

\begin{matrix} {\hat{b}}_{i} = \frac{1}{M} \sum_{m = 1}^{M} b_{i}^{(m)} \hat{γ} = \frac{1}{M} \sum_{m = 1}^{M} γ^{(m)}, \hat{ϕ} = \frac{1}{M} \sum_{m = 1}^{M} ϕ^{(m)} . \end{matrix}

Similarly, the consistent estimates of the posterior covariance matrices of

var (β | y^{*}, X, Z)

,

var (φ | y^{*}, X, Z)

,

var (σ^{2} | y^{*}, X, Z)

,

var (γ | W, T, Δ)

, and

var (ϕ | y^{*}, X, Z, W, T, Δ)

can be obtained via the sample covariance matrices. For example

\begin{matrix} \hat{var} (β | y^{*}, X, Z) = \frac{1}{M - 1} \sum_{m = 1}^{M} (β^{(m)} - \hat{β}) {(β^{(m)} - \hat{β})}^{⊤} . \end{matrix}

Therefore, the variance of the corresponding parameter can be obtained through the diagonal elements of the sample covariance matrix of the random sample sequence.

4. Simulation Studies

In this section, we conducted three simulation studies to investigate the finite performance of the above proposed methods.

We considered the model defined in models (1) and (2) with 200 individuals. Here, the specific model is as follows:

y_{i j}^{*} | b_{i} = X_{1 i j} β_{1} + X_{2 i j} β_{2} + X_{3 i j} β_{3} + X_{4 i j} β_{4} + X_{5 i j} β_{5} + X_{6 i j} β_{6} + sin (2 π t_{i j}) + b_{i} + ε_{i j},

λ_{i} (t | b_{i}) = λ_{0} (t) exp (W_{1 i} γ_{1} + W_{2 i} γ_{2} + W_{3 i} γ_{3} + W_{4 i} γ_{4} + ϕ b_{i}) .

The data were generated as follows: observation time

t_{i j}

was generated randomly between 0 and 3, the covariates

X_{1 i j}

and

X_{6 i j}

followed the Bernoulli distribution with the success probability 0.5 and 0.3, respectively. The covariates

X_{2 i j}, X_{3 i j}, X_{4 i j}

and

X_{5 i j}

were generated form the multivariate normal distribution

N_{4} (0, Σ)

with mean vector 0 and covariance matrix

Σ

. And the covariance matrix

Σ

is a symmetric positive definite matrix whose diagonal elements are 1 and the rest are

0.5

. The random error

ε_{i j}

is generated from the normal distribution with mean 0 and variance

σ^{2} = 0 . 6^{2}

. And

W_{i} = {(W_{i 1}, W_{i 2}, W_{i 3}, W_{i 4})}^{⊤} = {(X_{3 i 1}, X_{4 i 1}, X_{5 i 1}, X_{6 i 1})}^{⊤}

. The baseline hazard function

λ_{0} (t) = 0.7

and

ϕ = 0.6

. The censoring time

C_{i}

was generated from the Uniform distribution

U [0, 3]

, and

T_{i}^{*}

was generated from the Exponential distribution with mean

1 / λ_{i} (t | b_{i})

,

T_{i} = min (T_{i}^{*}, C_{i})

. Our main purpose is to use the proposed approaches to identify the unimportant covariates and estimate nonzero coefficients. Bayesian results were obtained from 200 replications.

To show that our proposed method above can both accurately estimate parameters of interest and identify unimportant variables, but also flexibly capture the feature of nonlinear function

g (t)

and random effects

b_{i}

, we consider three different simulation studies as follows.

Simulation I.

\begin{matrix} β = {(β_{1}, \dots, β_{6})}^{⊤} = {(1, 0, 0, - 0.5, 0.5, - 1)}^{⊤}, γ = {(γ_{1}, \dots, γ_{4})}^{⊤} = {(0, 1, - 0.5, 0)}^{⊤}, \end{matrix}

\begin{matrix} g (t) = sin (\frac{3}{4} π t), b_{i} \overset{i . i . d}{\sim} 0.6 N (- 0.8, 0 . 1^{2}) + 0.4 N (1.2, 0 . 5^{2}), \end{matrix}

Simulation II.

\begin{matrix} β = {(β_{1}, \dots, β_{6})}^{⊤} = {(1, 0, 0, - 0.5, 0.5, - 1)}^{⊤}, γ = {(γ_{1}, \dots, γ_{4})}^{⊤} = {(0, 1, - 0.5, 0)}^{⊤}, \end{matrix}

\begin{matrix} g (t) = t, b_{i} \overset{i . i . d}{\sim} 0.4 N (0, 0 . 3^{2}) + 0.3 N (- 1.5, 0 . 1^{2}) + 0.3 N (1.5, 0 . 1^{2}), \end{matrix}

Simulation III.

\begin{matrix} β = {(β_{1}, \dots, β_{6})}^{⊤} = {(1, 0.5, - 0.5, - 0.5, 0.5, - 1)}^{⊤}, γ = {(γ_{1}, \dots, γ_{4})}^{⊤} = {(- 0.5, 1, - 0.5, 1)}^{⊤}, \end{matrix}

\begin{matrix} g (t) = t^{2}, b_{i} \overset{i . i . d}{\sim} N (0, 0 . 8^{2}) . \end{matrix}

which the mean censoring rates of the survival times in the three simulation studies are 44%, 45%, and 37%, respectively.

For each of the above generated data sets, the proposed semiparametric Bayesian procedure was used to simultaneously evaluate Bayesian estimates of unknown parameters and select the important covariates. The prior hyperparameters

a_{ϑ_{β}} = a_{ϑ_{γ}} = a_{ϑ_{ϕ}} = 1, b_{ϑ_{γ}} = b_{ϑ_{β}} = b_{ϑ_{ϕ}} = 0.1

, corresponding to the hyperparameters of the hyper priors for the adjustment coefficients in Equations (10)-().

a_{0} = 1, b_{0} = 1

and

H_{φ}^{0} = 100 I_{4}

, corresponding to the prior parameters of

σ^{2}, φ

. We set the degree of B-splines

d = 3

, the number of konts

K = 4

, and

G = 25

. To investigate the convergence of the proposed algorithm, we calculated the estimated potential scale reduction (EPSR) values of parameters. In addition, we also need to test whether the nonlinear function fitted by the B-splines method is convergent. It can be seen from Figure 1 that the EPSR values were less than 1.2 after about 3000 iterations in three simulation studies. Thus, 3000 (

M = 3000

) observations were collected to calculated Bayesian estimates of parameters after 3000 iterations in producing Bayesian results for each of 200 replications. Results obtained under three simulation studies were reported in Table 1, where “Bias” is the difference between the true value and the mean of the estimates based on 200 replications, “RMS” is the root mean square between the estimates based on 200 replications and its true value, and “F0” is the proportion that parameter is identified to be zero in 200 replications in terms of the criterion that a parameter is identified to be 0 if its

95 %

confidence interval contains 0.

Results of Table 1 indicated that (i) Bayesian estimates of parameters are reasonably accurate because their absolute biases are less than 0.10 and the root mean square of their RMS values are less than 0.20; (ii) BLasso could identify the correct models in most cases regardless of prior inputs of parameters because the F0 values corresponding to the important covariates were less than

10 %

, but the F0 values corresponding to unimportant covariates were more than

90 %

. The performance of the proposed approach to recover the nonlinear function

g (t)

can be measured by the root mean square error

\begin{matrix} RASE (g^{(r)}) = \sqrt{\frac{1}{300} \sum_{l = 1}^{300} (g (u_{l}) - {\hat{g}}^{(r)} (u_{l}))}, r = 1, 2, \dots, 200, \end{matrix}

where

{\hat{g}}^{(r)} (t) = B^{⊤} (t) {\hat{φ}}^{(r)}

,

{\hat{φ}}^{(r)}

represents the Bayesian estimated value of the parameter vector

φ

in the r-th replication. Similar to the root mean square error of nonlinear function, we also calculate the root mean square error of random effects. Figure 2 plots the estimated curve and estimated density of the nonlinear function

\hat{g} (t)

and the random effects

b_{i}

of the replication in which the mean of the RASE of the nonlinear function and the random effects is in the middle of the 200 replications, and against true curves and true density in three simulation studies, respectively. Inspection of Figure 2 showed that (i) the Bayesian B-splines method proposed in this paper can be flexible enough fit the true curve of the nonlinear function

g (t)

; (ii) the CDPMM method proposed is flexible enough to capture the general shapes of our considered three distribution assumptions for

b_{i}

; (iii) the estimated 95% confidence region of the nonlinear function can cover its true curve with a fairly narrow area. From Table 2, based on the results of 200 replications in three simulation studies, the estimated means and standard deviation (SD) of random effects

b_{i}

are enough near to their corresponding true values, and 25%, 50% and 75% quantile of the mean of the RASE of the nonlinear function and the random effects are enough small and close to illustrate that it is robust to apply CDPMM to estimate random effects. All these findings indicated that (i) our proposed Bayesian B-splines can be flexible enough fit the true curve of the nonlinear; (ii) Bayesian procedure could well capture the true information of

b_{i}

regardless of their true distributions and forms; (iii) BLasso could identify the true model with a high probability.

5. An Example

In this section, we apply the method proposed in the previous sections to the MA.5 research experiment conducted by the Clinical Trial Group of the National Cancer Institute of Canada. The data contains 716 women with early-stage breast cancer before menopause. 356 patients were randomly selected to receive cyclophosphamide, epirubicin, and fluorouracil (CEF) adjuvant chemotherapy as the experimental group. The remaining 360 patients received cyclophosphamide, methotrexate, and fluorouracil (CMF) adjuvant chemotherapy as the control group of the trial. In clinical trials, visits were made before the start of treatment, each of the 6 treatment cycles, and every three months after treatment. At each visit, medical history and physical examination are accepted, and BCQ (Breast Cancer Questionnaire) is used to assess the patient’s quality of life. The dataset has a total of 7807 observations. By the end of the study, a total of 366 patients had died, and the censorship rate was about

49 %

. For a detailed study of this data, please refer to the literature of Song et al. [22] and Levine et al. [24]. We linearly convert the evaluated BCQ score into a unit interval

(0, 1)

, and the longitudinal data constrained on the interval

(0, 1)

is the longitudinal proportion data that we are interested in. The survival time of interest in the trial is the recurrence-free survival time (RFS), which is defined as the time from randomization to disease recurrence. Different treatment options, age, and the number of tumor-positive lymph nodes may directly affect RFS and the quality of life of patients. Similar to Tang et al. [17], we fitted the MA.5 research experiment data set to the following model:

\begin{matrix} y_{i j}^{*} | b_{i} = β_{1} {EM}_{i} + β_{2} {{NODE}_{_} POS}_{i} + β_{3} {AGE}_{i} + g (t_{i j}) + b_{i} + ε_{i j}, \end{matrix}

\begin{matrix} λ_{i} (t | b_{i}) = λ_{0} (t) exp (γ_{1} {EM}_{i} + γ_{2} {{NODE}_{_} POS}_{i} + γ_{3} {AGE}_{i} + ϕ b_{i}), \end{matrix}

where

y_{i j}^{*}

represents the BCQ score after logit function transformation,

{EM}_{i}

is a two-class treatment index,

{EM}_{i} = 1

indicates that the i-th patient underwent CEF treatment,

{EM}_{i} = 0

indicates that the i-th patient underwent CMF treatment. Age and the number of lymph node metastases are binary variables. Patients who are less than or equal to 40 years old belong to the younger group, that is,

AGE = 1

. Patients who are older than 40 years old belong to the elderly group, that is,

AGE = 0

. When the number of lymph node metastases is 0-3,

NODE_POS = 0

, otherwise it is 1. The nonlinear

g (t)

is estimated by using cubic B-splines function, and the domain of cubic B-splines function is

[min (t_{i j}), max (t_{i j})]

. The prior distributions and the values of all hyperparameters in the case study are the same as those set in the simulation study above. Based on the above settings, we calculated EPSR values for all parameters and are shown in the left panel in Figure 3, which shows that after about 3000 iterations, all EPSR values are less than 1.2. Therefore, we use the 3000 iterations after the 3000th time to calculate the Bayesian estimation. The example analysis results are shown in Table 3.

Inspection of Figure 3 showed that (i) the estimated density of random effects is approximately normally distributed. The distribution settings for random effects in our Simulation III correspond to. This shows that our CDPMM method is effective and reasonable; (ii) the estimated nonlinear function first decreases, then increases and finally stabilizes with time t. Although there are some fluctuations in the tail, this is reasonable and can be ignored. From Table 3, we can see that (i) the risk ratio of randomly receiving CEF and CMF treatment is

HR = exp (γ_{1}) = 71.605 %

, which means patients who randomly receive CEF chemotherapy have a lower risk; (ii) the credible interval

(0.171, 0.321)

of

β_{1}

does not contain 0, indicating that different adjuvant chemotherapy regimens have a significant impact on the QOL of patients and CEF chemotherapy is more toxic than CMF chemotherapy; (iii) the risk ratio of the number of lymph node metastases greater than or equal to 4 to less than 4 is

HR = exp (γ_{2}) = 205.238 %

, which means that the number of lymph node metastases is greater than or equal to 4. The risk of breast cancer recurrence is greater in patients, and the RFS is shorter; (iv)the regression coefficient

β_{2}

of the patient’s lymph node metastasis number greater than or equal to 4 is

0.309

, and its credible interval does not contain 0, which is highly significant. This suggests that patients with more lymph nodes have a lower quality of life, which is also consistent with clinical experience; (v) the risk ratio between the young group and the old group is

HR = exp (γ_{3}) = 182.212 %

, which means the risk of breast cancer recurrence is higher and the RFS is shorter in the young group; (vii) the credible interval

(0.168, 0.344)

of

β_{3} = 0.263

does not contain 0, indicating that different ages have a significant impact on the QOL of patients. It indicates that the quality of life of the olderly group is better than that of the young group. (viii)

ϕ = 0.259

and the credible interval

(0.003, 0.542)

of

ϕ

does not contain 0, which means that

ϕ

is significantly different from 0. This shows that there is a significant correlation between the longitudinal proportional data and survival data, and the JMSRE model proposed in this paper is applicable and reasonable for the analysis of the MA.5 research experiment data.

6. Concluding Remarks

In this paper, a semiparametric joint model for longitudinal proportional data and survival data is proposed by relaxing the normality assumption of random effects and without specifying a nonlinear function that affects longitudinal responses. The advantages of the proposed model include: (i) the proposed method improves the flexibility of jointly modeling longitudinal proportional data and survival data; (ii) the proposed B-splines method can flexibly capture different nonlinear functions; (iii) the proposed CDPMM method can flexibly capture the unimodal, bimodal and multimodal features of random effects; (iv) the computational burden is not heavy, e.g. it takes about 4 minutes to run the replication in the above simulation study and about 2 hours to run the breast cancer dataset.

Our simulation studies and example analysis show that the Bayesian estimation proposed based on the joint model are quite accurate and relatively robust. The nonlinear function curve estimated by Bayesian B-splines is more flexible and can capture the true characteristics of the nonlinear function. And CDPMM method could also well capture the true information of

b_{i}

. BLasso method could identify the true model with a high probability. Compared with the method proposed by Song et al. [22] to jointly model longitudinal proportional data and survival data, the joint model proposed in this paper is more flexible.

The joint model of longitudinal proportional data and survival data proposed in this paper still has many unsolved problems and we need to solve the following problems in the future: (i) does not make any constraints on the form of the basic hazard function; (ii) the distribution of random effects does not depend on the normality assumption and still has robust inferences; (iii) consider more complex spline models, for example nodes can be automatically selected to improve the performance of the proposed model; (iv) consider a joint model for the variable longitudinal proportional outcome and the multivariate survival outcome.

Appendix A

Conditional distribution of $b_{i}$

Let

θ_{b_{i}}

denotes all unknown parameters associated with distribution of

b_{i}

, for

i = 1, 2, \dots, n

.

θ_{b_{i}}

can be iteratively draw by using the following steps:

Step (a) Conditional distribution of

ξ

given

(μ^{*}, Ψ, b)

is given by

ξ | μ^{*}, Ψ, b \sim N_{q} (A, B),

where

B = {(G Ψ^{- 1} + {(Ψ^{0})}^{- 1})}^{- 1}

and

A = B ({(Ψ^{0})}^{- 1} ξ^{0} + Ψ^{- 1} \sum_{g = 1}^{G} μ_{g}^{*})

.

Step (b) For

j = 1, 2, \dots, q

, the diagonal elements of

Ψ

is conditionally distributed as

ψ_{j}^{- 1} | μ^{*}, ξ \sim Γ (c_{1} + \frac{G}{2}, c_{2} + \frac{1}{2} \sum_{g = 1}^{G} {(μ_{g_{j}}^{*} - ξ_{j})}^{2}),

where

μ_{g_{j}}^{*}

is the jth element of

μ_{g}^{*}

and

ξ_{j}

is the jth element of

ξ

.

Step (c) For

j = 1, 2, \dots, q

,

ϖ_{j} | Ω

is conditionally distributed as

ϖ_{j} | Ω \sim Γ (ϖ_{j}^{a}, ϖ_{j}^{b} + \sum_{g = 1}^{G} ω_{g_{j}}^{- 1}),

where

ω_{g_{j}}

is the j th diagonal element of

Ω_{g}

.

Step (d) Following Ishwaran and Zarepour [20], given

π

, the conditional distribution of

τ

can be given by

τ | π \sim Γ (a_{1} + G - 1, a_{2} - \sum_{g = 1}^{G - 1} log (1 - ν_{g}^{*})),

where

ν_{g}^{*}

is a random weight sampled from the the following beta distribution.

Step (e) Given L and

τ

, the conditional distribution of

π

can be obtained by following generalized Dirichlet distribution:

π | L, τ \sim Dir (a_{1}^{*}, b_{1}^{*}, a_{2}^{*}, b_{2}^{*}, \dots, a_{G - 1}^{*}, b_{G - 1}^{*}) .

where

a_{g}^{*} = 1 + d_{g}, b_{g}^{*} = τ + \sum_{ι = g + 1}^{G} d_{ι}

for

g = 1, 2, \dots, G - 1

, and

d_{g}

is the number of

L_{i}^{'} s

whose value equals to g.

ν_{g}^{*}

is independently generated from a Beta distribution

(a_{g}^{*}, b_{g}^{*})

. Then,

π_{1}, π_{2}, \dots, π_{G}

are obtained from the following formula:

π_{1} = ν_{1}^{*}, π_{G} = 1 - \sum_{g = 1}^{G - 1} π_{g}, and π_{g} = \prod_{ι = 1}^{g - 1} (1 - ν_{ι}^{*}) ν_{g}^{*}, for g \neq 1 or G .

Step (f) Let

L_{1}^{*}, L_{2}^{*}, \dots, L_{d}^{*}

be the d unique values of

{L_{1}, L_{2}, \dots, L_{n}}

(i.e., unique number of “clusters”), for

g = 1, 2 \dots, G

,

μ_{g}^{*}

is conditionally distributed as follows:

μ_{g}^{*} | ξ, Ψ \sim N_{q} (ξ, Ψ) for g \notin {L_{1}^{*}, L_{2}^{*}, \dots, L_{d}^{*}},

μ_{g}^{*} | ξ, Ψ, Ω, L, b \sim N_{q} (E_{g}, F_{g}) for g \in {L_{1}^{*}, L_{2}^{*}, \dots, L_{d}^{*}},

where

F_{g} = {(Ψ^{- 1} + Σ_{{i : L_{i} = g}} Ω_{i}^{- 1})}^{- 1}

and

E_{g} = F_{g} (Ψ^{- 1} ξ + Σ_{{i : L_{i} = g}} Ω_{i}^{- 1} b_{i})

for

g \in {L_{1}^{*}, L_{2}^{*}, \dots, L_{d}^{*}}

. Given

μ_{g}^{*}

,

μ_{g} = μ_{g}^{*} - Σ_{g = 1}^{G} π_{g} μ_{g}^{*}

,

μ^{*} = {μ_{1}^{*}, μ_{2}, \dots, μ_{G}^{*}}

and

μ = {μ_{1}, μ_{2}, \dots, μ_{G}}

.

Step (g) Given g, for

j = 1, 2, \dots, q

, the jth diagonal element of

Ω_{g}

is conditionally distributed as

ω_{g_{j}} \sim Γ (ω_{j}^{a}, ϖ_{j}) for g \notin {L_{1}^{*}, L_{2}^{*}, \dots, L_{d}^{*}},

ω_{g_{j}} \sim Γ (\frac{d_{g}}{2} + ω_{j}^{a}, ϖ_{j} + \sum_{{i : L_{i} = g}} \frac{1}{2} {(b_{i_{j}} - μ_{g_{j}})}^{2}) for g \in {L_{1}^{*}, L_{2}^{*}, \dots, L_{d}^{*}},

where

b_{i_{j}}

is the jth element of

b_{i}

and

μ_{g_{j}}

is the jth element of

μ_{g}

. Given

ω_{g_{j}}

,

Ω_{g} = diag (ω_{g_{1}}, ω_{g_{2}}, \dots, ω_{g_{q}})

and

Ω = {Ω_{1}, Ω_{2}, \dots, Ω_{G}}

.

Step (h) Given

π, μ, Ω, b

, the conditional distribution of

L_{i}

is obtained by

L_{i} | π, μ, Ω, b \overset{i . i . d}{\sim} Multinomial (π_{i g}^{*}, g = 1, 2, \dots, G),

where

π_{i g}^{*}

is proportional to

(π_{g} p (b_{i} | μ_{g}, Ω_{g})

with

b_{i} | μ_{g}, Ω_{g} \sim N_{q} (μ_{g}, Ω_{g})

, and

π_{g} (g = 1, 2, \dots, G)

are sampled from step (e). Given

L_{i}

,

μ

and

Ω

, the prior of

b_{i}

is distributed as

N_{q} (μ_{L_{i}}, Ω_{L_{i}})

with

μ_{L_{i}}

and

Ω_{L_{i}}

being the

L_{i}

elements of sets

μ

and

Ω

, respectively.

Step (i) The conditional distribution

p (b_{i} | β, φ, σ^{2}, γ, ϕ, Y^{*}, T, Δ, W)

is non-standard and cannot be derived directly via Gibbs sampling for

i = 1, 2, \dots, n

. Specially,

p (b_{i} | β, φ, σ^{2}, γ, ϕ, Y^{*}, T, Δ) \propto p (b_{i} | μ_{L_{i}}, Ω_{L_{i}}) p (Y_{i}^{*} | b_{i}; θ_{Y}) p (T, Δ | b; θ_{T}) .

The Metropolis-Hastings algorithm used to sample

b_{i}

is implemented as follows. At the mth iteration with a current value

b_{i}^{(m)}

, a new candidate

b_{i}

is drawn from the normal distribution

N_{q} (b_{i}^{(m)}, σ_{b}^{2} Σ_{b_{i}})

, where

Σ_{b_{i}} = {(Ω_{L_{i}}^{- 1} + Ξ_{i})}^{- 1}

and

Ξ_{i} = - \partial^{2} {(ln (p (Y_{i}^{*} | b_{i}; θ_{Y}) p (T, Δ | b; θ_{T})) / \partial b_{i} \partial b_{i}^{⊤} |}_{b_{i} = b_{i}^{(m)}}

. The new

b_{i}

is accepted with probability

min \{1, \frac{p (b_{i} | μ_{L_{i}}, Ω_{L_{i}}) p (Y_{i}^{*} | b_{i}; θ_{Y}) p (T, Δ | b; θ_{T})}{p (b_{i}^{(m)} | μ_{L_{i}}, Ω_{L_{i}}) p (Y_{i}^{*} | b_{i}^{(m)}; θ_{Y}) p (T, Δ | b_{i}^{(m)}, b_{- i}; θ_{T})}\},

where

b_{- i}

represents the remaining random effects except the random effects of the ith individual. The variance

σ_{b}^{2}

can be chosen such that the average acceptance rate is approximately 0.25 or more.

References

Faucett, C.L.; Thomas, D.C. Simultaneously modelling censored survival data and repeatedly measured covariates: a gibbs sampling approach. Stat. Med. 1996, 15, 1663–1685. [Google Scholar] [CrossRef]
Wulfsohn, M.S.; Tsiatis, A.A. A Joint model for survival and longitudinal data measured with error. Biometrics 1997, 53, 330–339. [Google Scholar] [CrossRef] [PubMed]
Schluchter, M.D. Methods for the analysis of informatively censored longitudinal data. Stat. Med. 1992, 11, 1861–1870. [Google Scholar] [CrossRef] [PubMed]
Tsiatis, A.A.; Degruttola, V.; Wulfsohn, M.S. Modeling the relationship of survival to longitudinal data measured with error: applications to survival and CD4 counts in patients with AIDS. J. Am. Stat. Assoc. 1995, 90, 27–37. [Google Scholar] [CrossRef]
Tsiatis, A.A.; Davidian, M. Joint modeling of longitudinal and time-to-event data: an overview. Stat. Sin. 2004, 14, 809–834. [Google Scholar] [CrossRef]
Yu, M.; Law, N.J.; Taylor, J.M.G.; Sandler, H.M. Joint longitudinal-survival-cure models and their application to prostate cancer. Stat. Sin. 2004, 14, 835–862. [Google Scholar] [CrossRef]
Diggle, P.J.; Sousa, I.; Chetwynd, A.G. Joint modelling of repeated measurements and time-to-event outcomes: the fourth armitage lecture. Stat. Med. 2008, 27, 2981–2998. [Google Scholar] [CrossRef] [PubMed]
Lawrence, A.; Gould, *!!! REPLACE !!!*; et al. Joint modeling of survival and longitudinal non-survival data: current methods and issues. Stat. Med. 2015, 34, 2181–2195. [Google Scholar] [CrossRef]
Henderson, R.; Diggle, P.; Dobson, A. Joint modelling of longitudinal measurements and event time data. Biostatistics 2000, 1, 465–480. [Google Scholar] [CrossRef]
Zeng, D.; Cai, J. Simultaneous modelling of survival and longitudinal data with an application to repeated quality of life measures. Lifetime Data Analysis 2005, 11, 151–174. [Google Scholar] [CrossRef]
Tang, A.; Zhao, X.; Tang, N. Bayesian variable selection and estimation in semiparametric joint models of multivariate longitudinal and survival data. Biom. J. 2017, 59, 57–78. [Google Scholar] [CrossRef] [PubMed]
Song, P.X.; Tan, M. Marginal models for longitudinal continuous proportional data. Biometrics 2000, 56, 496–502. [Google Scholar] [CrossRef] [PubMed]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Park, T.; Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
Hans, C. Bayesian lasso regression. Biometrika 2009, 96, 835–845. [Google Scholar] [CrossRef]
Guo, R.; Zhu, H.; Chow, S.M.; Ibrahim, J.G. Bayesian Lasso for semiparametric sructural equation models. Biometrics 2012, 68, 567–577. [Google Scholar] [CrossRef]
Tang, A.; Zhao, X.; Tang, N. Bayesian variable selection and estimation in semiparametric joint models of multivariate longitudinal and survival data. Biom. J. 2017, 59, 57–78. [Google Scholar] [CrossRef]
Ohlssen, D.I.; Sharples, L.D.; Spiegelhalter, D.J. Flexible random effects models using Bayesian semiparametric models: applications to institutional comparisons. Stat. Med. 2007, 26, 2088–2112. [Google Scholar] [CrossRef]
Yang, M.; Dunson, D.B.; Baird, D. Semiparametric bayes hierarchical models with mean and variance constraints. Comput. Stat. Data. Anal. 2010, 54, 2172–2186. [Google Scholar] [CrossRef]
Ishwaran, H.; Zarepour, M. Markov chain monte carlo in approximate dirichlet and beta two-parameter process hierarchical models. Biometrika 2000, 87, 371–390. [Google Scholar] [CrossRef]
Chow, S.; Tang, N.; Yuan, Y.; Song, X.; Zhu, H. Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior. Br. J. Math. Stat. Psychol. 2011, 64, 69–106. [Google Scholar] [CrossRef] [PubMed]
Song, H.; Peng, Y.; Tu, D. Jointly modeling longitudinal proportional data and survival times with an application to the quality of life data in a breast cancer trial. Lifetime Data Analysis 2017, 23, 183–206. [Google Scholar] [CrossRef] [PubMed]
Levine, M.; Bramwell, V.; Pritchard, K.; et al. Randomized trial of intensive cyclophosphamide, epirubicin, and fluorouracil chemotherapy compared with cyclophosphamide, methotrexate, and fluorouracil in premenopausal women with node-positive breast cancer. J. Clini. Oncol. 1998, 16, 2651–2658. [Google Scholar] [CrossRef] [PubMed]
Tang, A.; Duan, X.; Zhao, Y. Bayesian variable selection and estimation in semiparametric simplex mixed-effects models with longitudinal proportional data. Entropy 2022, 24, 1466. [Google Scholar] [CrossRef]

Figure 1. EPSR values of all parameters against iteration numbers for a randomly selected replication in Simulation I (left panel), Simulation II (middel panel) and Simulation III (right panel).

Figure 2. Estimated versus trues of nonlinear function

g (t)

(upper panels,) and estimated versus true denisties for random effects

b_{i}

(lower panels) in Simulation I (left panel), Simulation II (middel panel) and Simulation III (right panel).

Figure 2. Estimated versus trues of nonlinear function

g (t)

(upper panels,) and estimated versus true denisties for random effects

b_{i}

(lower panels) in Simulation I (left panel), Simulation II (middel panel) and Simulation III (right panel).

Figure 3. EPSR values of all parameters (left panel), estimated curve for nonlinear function

g (t)

(middle panel) and estimated density for random effects

b_{i}

(right panel) in the the MA.5 research experiment study.

Figure 3. EPSR values of all parameters (left panel), estimated curve for nonlinear function

g (t)

(middle panel) and estimated density for random effects

b_{i}

(right panel) in the the MA.5 research experiment study.

Table 1. Bayesian estimates of parameters in the simulation studies.

Pra.	True	Simulation I			Simulation II			Simulation III
		Bias	RMS	F0(%)	Bias	RMS	F0(%)	True	Bias	RMS	F0(%)
$β_{1}$	1.00	-0.005	0.058	0.0	-0.080	0.112	0.0	1.00	-0.015	0.104	0.0
$β_{2}$	0.00	0.002	0.036	99.0	0.001	0.049	98.0	0.50	-0.009	0.077	0.0
$β_{3}$	0.00	-0.002	0.033	99.0	0.005	0.042	99.5	-0.50	0.011	0.085	0.0
$β_{4}$	-0.50	0.005	0.041	0.0	0.003	0.056	0.0	-0.50	0.001	0.079	0.0
$β_{5}$	0.50	-0.001	0.037	0.0	-0.009	0.063	0.0	0.50	0.000	0.083	0.0
$β_{6}$	-1.00	0.013	0.066	0.0	0.006	0.091	0.0	-1.00	0.028	0.136	0.0
$γ_{1}$	0.00	0.010	0.120	95.5	-0.004	0.105	98.0	-0.50	0.002	0.125	0.5
$γ_{2}$	1.00	0.001	0.148	0.0	-0.052	0.144	0.0	1.00	-0.008	0.141	0.0
$γ_{3}$	-0.50	0.018	0.138	5.0	0.039	0.136	6.5	-0.50	0.008	0.127	2.5
$γ_{4}$	0.00	-0.002	0.197	97.0	0.010	0.178	97.5	1.00	-0.007	0.191	0.0
$ϕ$	0.60	0.006	0.106	0.0	-0.019	0.088	0.0	0.60	-0.002	0.145	1.0
$σ^{2}$	0.36	-0.002	0.018	–	-0.001	0.019	–	0.36	0.005	0.020	–

Table 2. Estimated mean and standard deviation for random effects, and quantiles of mean RASE for nonlinear functions and random effects.

	Est of random effects				Quantile of RASE
	Mean	Est mean	SD	Est SD	25%	50%	75%
Simulation I	-0.011	-0.007	1.004	0.961	0.091	0.115	0.138
Simulation II	-0.023	0.014	1.268	1.202	0.135	0.181	0.239
Simulation III	-0.001	-0.001	0.879	0.797	0.109	0.152	0.227

Table 3. Bayesian estimatations of parameters in the MA.5 research experiment study.

Par.	Est	SD	CI
$β_{1}$	0.241	0.038	( 0.171, 0.321)
$β_{2}$	0.309	0.050	( 0.214, 0.414)
$β_{3}$	0.263	0.046	( 0.168, 0.344)
$γ_{1}$	-0.334	0.142	(-0.608,-0.057)
$γ_{2}$	0.719	0.143	( 0.452, 0.999)
$γ_{3}$	0.600	0.145	( 0.315, 0.880)
$ϕ$	0.259	0.135	( 0.003, 0.542)
$σ_{2}$	0.180	0.003	( 0.174, 0.186)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Bayesian Joint Modeling Analysis of Longitudinal Proportion and Survival Data

Abstract

Keywords:

Subject:

1. Introduction

2. Model and Notation

3. Bayesian estimation of joint model

3.1. Prior specification

3.2. Bayesian analysis of joint model

4. Simulation Studies

5. An Example

6. Concluding Remarks

Appendix A

References

MDPI Initiatives

Important Links

Subscribe