1. Introduction
When comparing the survival times of two independent groups, the Mann-Whitney parameter plays an important role in the two-sample problem ([
1]). The Mann-Whitney parameter, say
p, is defined as the probability that a random subject from one group (with survival time
in group 1) survives longer than an independent random subject from the other group (with survival time
in group 2), plus one-half the probability that the two subjects survive at the same time:
The Mann-Whitney effect relates to important statistical ideas, such as, the Mann-Whitney test ([
2]), hazard ratios, and
win ratio ([
3]). The Mann-Whitney test examines the null hypothesis
v.s.
. The hazard ratio is the main effect measure of a Cox proportional hazards model, which is a typical statistical model in survival analysis. The win ratio
w is given by the odds of
p; that is,
. That is,
, or equivalently
, implies a protective survival effect for group 1.
The problem of estimating the parameter
p plays an important part in survival analysis. The basic idea was first studied by [
4]. They illustrated an attractive relationship between the Mann-Whitney statistic and the stress-strength model. [
1] first proposed a nonparametric estimator for
p under independent censoring. Since then, this topic has been investigated by several researchers. In the following, we refer to some recent studies in the field of survival analysis. [
5] modified Efron’s estimator for
p under small sample sizes. [
6] proposed a copula-graphic estimator for
p and suggested the Mann-Whitney test to compare two survival distributions in the presence of dependent censoring. [
7] introduced the Bayesian estimation of
p for the log-Lindley distribution. [
8] proposed estimating the Mann-Whitney effects in factorial clustered data. [
9] developed methodologies for constructing fixed-accuracy confidence intervals of
p when
and
follow geometric and the exponential distributions, respectively. [
10] studied a estimation procedure of the stress-strength model for the two independent unit-half-normal distributions with different shape parameters. [
11] investigated the effect of dependence of the valiables on
p in the stress-strength model with the exponential margins. [
12] proposed a group sequential method for estimating the Mann-Whitney parameter. [
13] studied the estimator of
p in point, interval, and Bayesian estimations when the stress valiables follow geometric and Lindley distribution. All the methods assumed that
and
are independent.
When
and
are independent of each other and continuous, one can estimate
p with the marginal distributions based on the following integral:
That is, one can estimate
p by estimating two marginal survival functions
and
. However, this is not the case when
and
are dependent; the phenomenon is sometimes called “Hand’s paradox” ([
14]). This showed that the paradox arises when
and
are regarded as potential outcomes in the framework of causal inference. Therefore,
p in the integral cannot be interpreted as the true treatment effect. Besides, dependence of outcomes from observation to observation is well-known in factorial designs and cross-over designs ([
15]).
Since
p is not identifiable solely from independently sampled data, [
16] suggested a bound for
p under all possible dependence structures for
and
. Alternatively, [
17] reformulated
p such that it can be identified from randomized treatment assignments. However, to estimate the true
p, we must model the bivariate survival function of
and
. Copula is often used to model joint distributions of dependent survival times ([
18,
19]).
In this article, we propose a model for the bivariate survival function by using parametric copulas and parametric marginal distributions. We then derive a new formula for computing p by a one-dimensional integral. We also propose a new formula for p under the restricted follow-up. To make the proposed computation method for p to be easily performed by users, we develop a Shiny-based web app. Furthermore, we validate the accuracy of the proposed computation method and Shiny web app by simulations. We finally illustrate the proposed method by two real datasets.
The rest of the paper is organized as follows. In
Section 2, we review copula-based models and introduce several well-known copula families. In this section, we show that one can compute
p by Theorem in [
20], and we extend the theorem to compute
p when the follow-up time is restricted up to time
. In
Section 3, we introduce a Shiny web app in the R that can compute
p via simple commands. In
Section 4, we describe a simulation study to show the correctness of the proposed calculator for
p. In
Section 5, we illustrate a meaningful application of our proposed method using survival data.
5. Numerical Examples
In this section, we apply our proposed methods to a tongue cancer dataset and a prostate cancer dataset. Before analyzing the real datasets, we introduce basic notations and ideas for estimating
p by using censored data. Let
be survival times,
be censored times,
be observed time, and
be event indicator. What we observe is
and
. That is,
is 0 or 1 according to whether
is a censored time or a survival time. As the exponential distribution is shown to fit well for
and
, we obtained MLE of the exponential hazard rate
by
Then, by applying the values of the MLE to the proposed Shiny web app, we obtained the estimators
and
, where
was chosen appropriately (
Section 5.1 and
Section 5.2). On the other hand, under the independence assumption of
and
, the naïve estimator of
p is
where
and
is a Kaplan-Meier (KM) estimator. However, this estimate is subject to the independence of two groups. Therefore, the proposed estimator is useful to examine the sensitivity under a variety of dependence structures via copulas.
5.1. Tongue Cancer Data
The tongue dataset is available in the R package
KMsurv. It has 80 observations and contains: type (Tumor DNA profile: 1 = aneuploid tumor, 2 = diploid tumor), time (Time to death or on-study time (weeks)), and death (Event indicator: 0 = alive, 1 = dead). It contains
observations in aneuploid cancer group (
), and
observations in diploid cancer group (
). We considered the follow-up time
and obtained
. The tongue cancer data resulted in
and
. In
Figure 4, the KM estimators of each groups and the estimated exponential survival curves are plotted. We conducted sensitivity analyses using copula-based approach. We calculated
by Theorem 1 under weak, strong positive, and negative independences. We calculated
via the web app (
Section 3).
Figure 5 shows the output under the independent, Clayton, Gumbel, Frank, FGM, and GB copula with parameter
. The results under all copulas are summarized in
Table 2. We obtained the
ranged from
to
and concluded that a subject in DNA-aneuploid tumor gruop survives longer than in DNA-diploid tumor group. This conclusion did not change under any depencence structures we conducted.
5.2. Prostate Cancer Data
The prostate cancer data is avalible in the R package
asaur ([
41]). It has 14,294 observations and contains: grade (moderately differentiated and poorly differentiated), survTime (time from diagnosis to death or last date known alive), and status (Event indicator: 0 = censored, 1 = death from prostate cancer). It contains
observations in moderately differentiated group (
), and
observations in poorly differentiated group (
).
The prostate cancer data resulted in
,
, and
. In
Figure 6, we plot the KM estimators of each group and estimated exponential survival curves. We calculated
by Theorem 1 with copulas and several parameters. We calculated
under the independent, Clayton, Gumbel, Frank, FGM, and GB copulas with parameter
via the web app (
Figure 7). The results for all scenarios are summarized in
Table 3. We obtained the
ranged from
to
and concluded that a subject in moderately differentiated gruop survives longer than in poorly differentiated group. The range of
is narrower than one of tongue cancer dataset. The results may be caused by the distinct difference of survival curves between two groups, the short follow-up time for two groups and large sample size.
6. Conclusion
The Mann-Whitney effect has been widely used for survival analysis, which can provide the meaningful measure for treatment effects for survival outcomes. However, the Mann-Whitney effect may not be interpreted as the true treatment effect under dependence of two survival times. In this article, we proposed a parametric copula-based approach for estimating the Mann-Witney effect p under depencence structures for two survival times. We derived the formulas of p under a variety of copulas and marginal survival functions. We also introduced a web-based calculator for p for users. Simulation studies demonstrated the correctness of the proposed calculator for p under a variety of the parametric marginal survival distributions and copulas. The results of data analyses show that the proposed method gives possible changes of p under various denpendence and enables to examine the sensitivity.
In the examples of real datasets, we obtained
under the Clayton, Gumbel, Frank, FGM, GB copulas with varying parameters. The value of
ranged from
to
in tongue cancer dataset, from
to
in prostate cancer dataset. We obtained the narrow ranges whose lower bound did not include the null value of
. The result is consistent with previous studies that Hand’s paradox does not occur under strictly monotonic effect ([
14,
42]). While more complex dependence structures with various copulas might be considered, the conclusion may not change much.
The main limitation of the present article is that we only discussed the “parametric” approach. However, in practice, researchers may use the “semi-” or “non-parametric” approach. In future work, we will examine the method of computing
p without parametric assumptions. Another extension is to include covariates or secondary outcomes in the model, which help obtain narrow bounds for treatment effects ([
43]). Another limitation is that only one-parameter copulas are implemented. There are multi-parameter copulas that deserve attention ([
22,
28,
44]).
Figure 1.
Scatter plots of 3,000 data points generated from the copula distribution with parameter .
Figure 1.
Scatter plots of 3,000 data points generated from the copula distribution with parameter .
Figure 2.
Survival-curve plots of the parametric distribution functions.
Figure 2.
Survival-curve plots of the parametric distribution functions.
Figure 3.
The web app showing the results for computing p and .
Figure 3.
The web app showing the results for computing p and .
Figure 4.
KM estimators for DNA-aneuploid tumor and DNA-diploid tumor group and exponential survival curves with MLE of exponential hazard rates, .
Figure 4.
KM estimators for DNA-aneuploid tumor and DNA-diploid tumor group and exponential survival curves with MLE of exponential hazard rates, .
Figure 5.
Example for the tongue cancer dataset on the web app. This setting is marginal distribution: “Exponential”, , copula: “Clayton”, copula parameter: , and langage: “English”.
Figure 5.
Example for the tongue cancer dataset on the web app. This setting is marginal distribution: “Exponential”, , copula: “Clayton”, copula parameter: , and langage: “English”.
Figure 6.
KM estimators for mode grade and poor grade group and exponential survival curves with MLE of exponential hazard rates, .
Figure 6.
KM estimators for mode grade and poor grade group and exponential survival curves with MLE of exponential hazard rates, .
Figure 7.
Example for the tongue cancer dataset on the web app. This setting is marginal distribution: “Exponential”, , copula: “Gumbel”, copula parameter: , and langage: “English”.
Figure 7.
Example for the tongue cancer dataset on the web app. This setting is marginal distribution: “Exponential”, , copula: “Gumbel”, copula parameter: , and langage: “English”.
Table 1.
Comparison of the theoretical value and the simulation value for calculating defined in Theorem 1.
Table 1.
Comparison of the theoretical value and the simulation value for calculating defined in Theorem 1.
|
|
|
|
|
|
|
|
|
|
|
Distribution |
Copula |
|
|
|
|
|
|
|
|
|
|
|
|
Exponential |
Clayton |
1 |
0.33 |
1 |
- |
2 |
- |
0.645 |
0.643 |
0.737 |
0.738 |
0.744 |
0.746 |
|
|
5 |
0.71 |
1 |
- |
2 |
- |
0.704 |
0.706 |
0.872 |
0.872 |
0.881 |
0.883 |
|
|
10 |
0.83 |
1 |
- |
2 |
- |
0.746 |
0.745 |
0.920 |
0.921 |
0.930 |
0.930 |
|
Gumbel |
0 |
0.00 |
1 |
- |
2 |
- |
0.629 |
0.631 |
0.666 |
0.666 |
0.666 |
0.665 |
|
|
4 |
0.80 |
1 |
- |
2 |
- |
0.798 |
0.799 |
0.961 |
0.961 |
0.970 |
0.969 |
|
Frank |
-5 |
-0.46 |
1 |
- |
2 |
- |
0.615 |
0.615 |
0.622 |
0.622 |
0.622 |
0.622 |
|
|
1 |
0.11 |
1 |
- |
2 |
- |
0.636 |
0.636 |
0.684 |
0.684 |
0.685 |
0.685 |
|
|
5 |
0.46 |
1 |
- |
2 |
- |
0.674 |
0.674 |
0.771 |
0.768 |
0.773 |
0.772 |
|
FGM |
-1 |
-0.22 |
1 |
- |
2 |
- |
0.617 |
0.617 |
0.633 |
0.634 |
0.633 |
0.631 |
|
|
0 |
0.00 |
1 |
- |
2 |
- |
0.629 |
0.629 |
0.666 |
0.666 |
0.666 |
0.666 |
|
|
1 |
0.22 |
1 |
- |
2 |
- |
0.642 |
0.641 |
0.699 |
0.697 |
0.700 |
0.702 |
|
GB |
0.5 |
-0.21 |
1 |
- |
2 |
- |
0.623 |
0.624 |
0.642 |
0.643 |
0.642 |
0.641 |
|
|
1 |
-0.36 |
1 |
- |
2 |
- |
0.617 |
0.616 |
0.629 |
0.628 |
0.629 |
0.632 |
Weibull |
Clayton |
1 |
0.33 |
1 |
0.5 |
2 |
1 |
0.497 |
0.496 |
0.594 |
0.589 |
0.603 |
0.602 |
|
|
5 |
0.71 |
1 |
0.5 |
2 |
1 |
0.482 |
0.480 |
0.644 |
0.645 |
0.653 |
0.654 |
|
|
10 |
0.83 |
1 |
0.5 |
2 |
1 |
0.472 |
0.472 |
0.645 |
0.645 |
0.654 |
0.653 |
|
Gumbel |
0 |
0.00 |
1 |
0.5 |
2 |
1 |
0.511 |
0.509 |
0.560 |
0.562 |
0.562 |
0.562 |
|
|
4 |
0.80 |
1 |
0.5 |
2 |
1 |
0.425 |
0.424 |
0.584 |
0.585 |
0.593 |
0.594 |
|
Frank |
-5 |
-0.46 |
1 |
0.5 |
2 |
1 |
0.528 |
0.530 |
0.542 |
0.541 |
0.542 |
0.543 |
|
|
1 |
0.11 |
1 |
0.5 |
2 |
1 |
0.505 |
0.504 |
0.566 |
0.565 |
0.569 |
0.572 |
|
|
5 |
0.46 |
1 |
0.5 |
2 |
1 |
0.486 |
0.486 |
0.592 |
0.595 |
0.597 |
0.597 |
|
FGM |
-1 |
-0.22 |
1 |
0.5 |
2 |
1 |
0.521 |
0.523 |
0.548 |
0.549 |
0.549 |
0.551 |
|
|
0 |
0.00 |
1 |
0.5 |
2 |
1 |
0.511 |
0.509 |
0.560 |
0.563 |
0.562 |
0.562 |
|
|
1 |
0.22 |
1 |
0.5 |
2 |
1 |
0.501 |
0.503 |
0.572 |
0.574 |
0.575 |
0.577 |
|
GB |
0.5 |
-0.21 |
1 |
0.5 |
2 |
1 |
0.519 |
0.520 |
0.549 |
0.546 |
0.549 |
0.548 |
|
|
1 |
-0.36 |
1 |
0.5 |
2 |
1 |
0.526 |
0.526 |
0.545 |
0.545 |
0.545 |
0.545 |
Gamma |
Clayton |
1 |
0.33 |
1 |
1.5 |
2 |
2 |
0.529 |
0.529 |
0.651 |
0.649 |
0.679 |
0.678 |
|
|
5 |
0.71 |
1 |
1.5 |
2 |
2 |
0.530 |
0.528 |
0.763 |
0.763 |
0.809 |
0.810 |
|
|
10 |
0.83 |
1 |
1.5 |
2 |
2 |
0.534 |
0.533 |
0.817 |
0.816 |
0.862 |
0.863 |
|
Gumbel |
0 |
0.00 |
1 |
1.5 |
2 |
2 |
0.530 |
0.530 |
0.611 |
0.612 |
0.615 |
0.614 |
|
|
4 |
0.80 |
1 |
1.5 |
2 |
2 |
0.545 |
0.546 |
0.813 |
0.813 |
0.853 |
0.853 |
|
Frank |
-5 |
-0.46 |
1 |
1.5 |
2 |
2 |
0.532 |
0.530 |
0.584 |
0.584 |
0.584 |
0.583 |
|
|
1 |
0.11 |
1 |
1.5 |
2 |
2 |
0.530 |
0.529 |
0.622 |
0.622 |
0.628 |
0.628 |
|
|
5 |
0.46 |
1 |
1.5 |
2 |
2 |
0.530 |
0.530 |
0.679 |
0.679 |
0.694 |
0.692 |
|
FGM |
-1 |
-0.22 |
1 |
1.5 |
2 |
2 |
0.531 |
0.533 |
0.591 |
0.591 |
0.592 |
0.591 |
|
|
0 |
0.00 |
1 |
1.5 |
2 |
2 |
0.530 |
0.529 |
0.611 |
0.610 |
0.615 |
0.614 |
|
|
1 |
0.22 |
1 |
1.5 |
2 |
2 |
0.529 |
0.529 |
0.631 |
0.631 |
0.639 |
0.640 |
|
GB |
0.5 |
-0.21 |
1 |
1.5 |
2 |
2 |
0.531 |
0.532 |
0.597 |
0.598 |
0.598 |
0.598 |
|
|
1 |
-0.36 |
1 |
1.5 |
2 |
2 |
0.532 |
0.532 |
0.589 |
0.592 |
0.590 |
0.588 |
Table 2.
Estimates for fitting the KM estimator (independent) and with the exponential marginal survival distributions (the independent, Clayton, Gumbel, Frank, FGM, GB copulas) for the tongue cancer dataset.
Table 2.
Estimates for fitting the KM estimator (independent) and with the exponential marginal survival distributions (the independent, Clayton, Gumbel, Frank, FGM, GB copulas) for the tongue cancer dataset.
Copula |
marginal distribution |
|
|
|
Independent |
KM estimator |
- |
- |
0.632 |
Independent |
exponential |
- |
0.638 |
0.633 |
Clayton |
exponential |
1 |
0.709 |
0.676 |
|
|
5 |
0.856 |
0.799 |
Gumbel |
exponential |
4 |
0.906 |
0.862 |
Frank |
exponential |
-5 |
0.600 |
0.600 |
|
|
5 |
0.733 |
0.714 |
FGM |
exponential |
-1 |
0.609 |
0.609 |
|
|
1 |
0.666 |
0.658 |
GB |
exponential |
0.5 |
0.617 |
0.617 |
|
|
1 |
0.606 |
0.606 |
Table 3.
Estimates for fitting the KM estimator (independent) and with the exponential marginal survival distributions (the independent, Clayton, Gumbel, Frank, FGM, GB copulas) for the prostate cancer dataset.
Table 3.
Estimates for fitting the KM estimator (independent) and with the exponential marginal survival distributions (the independent, Clayton, Gumbel, Frank, FGM, GB copulas) for the prostate cancer dataset.
Copula |
marginal distribution |
|
|
|
Independent |
KM estimator |
- |
- |
0.679 |
Independent |
exponential |
- |
0.821 |
0.625 |
Clayton |
exponential |
1 |
0.889 |
0.626 |
|
|
5 |
0.958 |
0.635 |
Gumbel |
exponential |
4 |
0.997 |
0.665 |
Frank |
exponential |
-5 |
0.753 |
0.624 |
|
|
5 |
0.924 |
0.632 |
FGM |
exponential |
-1 |
0.777 |
0.623 |
|
|
1 |
0.865 |
0.626 |
GB |
exponential |
0.5 |
0.786 |
0.624 |
|
|
1 |
0.764 |
0.623 |