1. Introduction
The Ranked Set Sampling (RSS) procedure has been used advantageously in agriculture, forestry, environmental, ecological and recently in human studies where the exact measurement of units is either difficult or expensive. For example, in forestry, the measurement of stem volume of standing trees is difficult but the ranking of the trees using their height and diameter at breast height is rather easy. For such situations, McIntyre (1952) introduced RSS to estimate the population mean. The RSS is a cost-efficient alternative to simple random sampling (SRS) if observations can be ranked according to the characteristic under investigation by means of visual inspection or other methods not requiring actual measurements. McIntyre (1952) indicated that the RSS procedure is superior to SRS procedure to estimate the population mean. However, Dell and Clutter (1972) and Takahasi and Wakimoto (1968) provided mathematical foundation for RSS. Dell and Clutter (1972) also showed that the estimator for population mean based on RSS is at least as efficient as the estimator based on SRS with the same number of measurements even though when there are ranking errors. Bhoj (2001) introduced RSS with unequal samples. Bhoj and Kushary (2016) proposed RSS with unequal samples for positively skew distributions with heavy right tails. RSS is a nonparametric procedure. However, recently, RSS has also been used in the parametric setup (see Bhoj and Ahsanullah (1996); Bhoj (1997a, 1997b); Lam et al. (1994); Stokes (1995).
The selection of ranked set sample of size
k involves drawing
k random samples with
k units in each sample. The units in each sample are ranked by using judgment or other methods not requiring actual measurements. The unit with lowest rank is measured from the first sample, the unit with second lowest rank is measured from the second sample, and the procedure is continued until the unit with the highest rank is measured from the last sample. The
k2 ordered observations in
k samples can be displayed in the matrix form as:
We measure only
k diagonal observations, and they constitute the RSS. We note that these
k observations are independently but not identically distributed. In RSS,
k is usually small to reduce the ranking errors and therefore, to increase the sample size, the above procedure is repeated
times to get the sample of size
. In this paper, we assume
m=1.
In the present paper, our main interest is to estimate the population mean for positively skew distributions with longer right tail. We propose estimators based on weighted ranked set sampling (WRSS) and compare their performance with the ones based on the usual RSS procedure and Neyman’s optimal allocation model. In section 2, we summarize the estimators of population mean based on RSS procedure and Neyman’s optimal solution. In section 3, we propose our WRSS procedure to estimate the population mean of skew distributions. First, we introduce WRSS procedure where we assign one low weight to the highest order statistics and calculated the relative precisions of the estimator based on WRSS, RSS and Neyman’s optimal procedure with respect to the estimator based on SRS. The procedures are used to obtain the relative precisions by using the four positively skew distributions. We also computed one set of weights for all four distributions for each k. In section 4, we derived optimal weights for the lowest and highest order statistics for the chosen distributions for each k. We then obtained one set of weights for the lowest and highest order statistics for each k which will maximize the sum of relative precisions of four distributions. In section 5, we generalize the use of all optimal weights for all order statistics for k=4 and k=5 for each distribution. We also obtained one set of weights for each k for the four chosen distributions. In section 6, to see the effect of increasing skewness, the relative precisions of estimators for lognormal family of distributions have been compared. In section 7, we summarize the results with recommendations.
2. Estimation of Mean
We consider first the usual RSS to estimate the population mean. Let
denote the value of characteristic under study of
order statistic. The mean and variance of the
rank order statistic for set size
k are denoted by
and
, respectively. We denote the population mean and variance by
and
, respectively. Then the unbiased estimator for
based on RSS is given by
with the variance
The relative precision of
compared to the estimator based on SRS with the same number of observations
k (Bhoj and Chandra, 2019) is
where
is the average within-rank variance.
For the skewed distribution, Neyman’s allocation
provides the optimal allocation and the relative precision of the unbiased estimator of
based on this model with respect of SRS with the same number of observations
n and is given by (Bhoj and Chandra, 2019).
where,
is the average within-rank standard deviation.
There are some unequal allocation models for the skew distributions in the literature (see, ‘t’ and ‘s, t’ model (Kaur et al., 1997); Systematic model (Tiwari and Chandra, 2011) and simple model (Chandra et al., 2018 and Bhoj and Chandra, 2019)). The Neyman’s allocation does not provide the integer values of which are necessary for any application. The procedure of making them integer is shown in Bhoj and Chandra (2019) and used in this paper. It is noted that the inequality always holds for the skew distributions.
3. WRSS with One Optimal Weight
In this section, we propose a weighted ranked set sampling (WRSS) with the optimal weight for the largest order statistic since the largest order statistic has the highest variance and higher bias of the estimator for the mean when we deal with the positively skew distributions. We define that the weights
as,
The exact values of weights are proposed as follows:
Our weighted estimator for the population mean
is
The relative precision of our biased estimator
with respect to the estimator based on SRS is
The value of
is to be chosen such that the
is maximum. To find the optimum value of
(for each
k), the excel program of
was developed and using the different iterations on
, the values of
was tested until it gets maximum. All the other values above and below from this optimal
,
starts decreasing.
We computed
for all four chosen distributions lognormal (LN(0, 1)), Pareto (P(3.5) and P(4.5)) and Weibull (W(0.5)) and
k=2(1)5. The values of
,
,
and
for these distributions and
k=2(1)5 are presented in
Table 1. The values of
are much higher than
, i.e., the relative precisions of the estimator based on RSS procedure. Furthermore, the
are higher than
, i.e., those based on Neyman’s optimal allocation model for all four distributions when
. All relative precisions increase as
k increases for LN(0,1), P(3.5) and P(4.5). However, for W(0.5),
decreases as
k increases. This may be because the distribution W(0.5) has extremely large skewness and kurtosis.
Now we attempt to compute one set of values of for four values of sample sizes, which will work well for all chosen four distributions. In these computations, was determined so that the sum of for the four distributions is close to the maximum. This optimum value of was found using the same iteration procedure in the developed excel program.
The values of optimum
, and
for the chosen four distributions and four sample sizes are presented in
Table 2. The values of
in
Table 2 are slightly smaller than the ones in
Table 1 as is expected. However, the pattern of
remained the same.
4. WRSS with Two Optimal Weights
In this Section, we propose a WRSS with two optimal weights for the two extreme order statistics. Here the weights
for
k>2 are defined as
The proposed exact weights are as follows:
where
.
Our estimator of population mean is
The relative precision of
with respect to the estimator based on SRS is
We calculate the optimal values of
and
using the iteration method. Based on these values, we computed
along with
and
for chosen four distributions and sample sizes
k=3, 4 and 5 are presented in
Table 3. The gains in precisions of the estimator
over
are marginal. The gains of
based on
are substantially higher than the estimator based on RSS.
is superior to the estimator based on Neyman’s optimal allocation model for all
k for the LN(0,1) and P(3.5) distributions. The values of
are higher than those of
for the other two distributions for
k=3 and 4. The gains of
over
for
k=5 for these two distributions are marginal.
As we did in case of
we attempt to compute one set of values of
and
for three values of sample sizes which will work well for all chosen four distributions. In these computations,
and
were determined so that the sum of relative precisions of
for the four distributions is close to the maximum relative precision. The values of
and
,
,
and
for three sample sizes and four chosen distributions are presented in
Table 4. The relative precisions of
in
Table 4 are higher than those of
for each
k in
Table 2. The pattern of relative precisions are same as seen in
Table 3.
5. WRSS with All Optimal Weights
Now, we extend WRSS with optimal weights for all order statistics for k=4 and 5. We take , and determine the optimal values of C and by minimizing MSE of the estimator by using , .
The values of are chosen so that the value of is maximized. Then we repeat the procedure of computing the optimal values of C and with these new . The procedure is repeated until the value of achieves the maximum value. We did this by using the developed computer program in Excel.
The values of
are presented in
Table 5. We observe that the values of
presented in
Table 4 are higher than the values of
based on one or two optimal weights which are given in
Table 1 and
Table 3.
As we did in section 3 and 4, we computed one set of values of
C,
and different fractions
for
k=4 and
k=5 which work well for all chosen four distributions. In these computations, these values were determined so that the sum of
for the four distributions is close to the maximum relative precision. These values along with
,
and
for
k=4 and
k=5 and four chosen distributions are presented in
Table 6. As we expected the values of
are smaller in
Table 6 when compared to the values of
in
Table 5. However, the pattern of relative precisions remains the same.
6. WRSS with Increasing Skewness
In this section, we wish to study the performance of the three methods, RSS, WRSS and Neyman’s optimum allocation model with increasing values of skewness of a family of distributions. For this purpose, the lognormal distribution,
has been considered. The
pdf of
is given by
Then skewness (
Sk) and shape parameter (
p) are given by
The performance of these three methods relative to SRS with
k=4 is presented in
Table 7 for lognormal family of distributions for a range of values of population standard deviation. The variances of the order statistics of the family of distributions were computed by using the variances of order statistics for different values of shape parameter (
p) which are readily available in Balakrishnan and Chen (1999). From
Table 7, we observe that as skewness increases the performance of (i) RSS method decreases, and (ii) Neyman’s and WRSS methods increases. The values of
based on all and two optimal weights are higher than
for all values of shape parameters, However,
based on one optimal weight is higher than
for all
p>1.9. The rate of increase of relative precisions of the proposed estimators based on WRSS are more than that of estimator based on Neyman’s method (See
Figure 1).
7. Conclusions and Discussion
In this paper, we proposed weighted ranked set sampling procedure to estimate the population mean of the distributions which are positively skew with heavy right tail. We chose four distributions: lognormal (LN(0, 1)), Pareto (P(3.5) and P(4.5)) and Weibull (W(0.5)). The means and variances of order statistics for these distributions are readily available in Harter and Balakrishnan (1996). We proposed three weighted ranked set sampling procedures. The first procedure is based on one optimal weight for the largest order statistics, the second procedure is to use the two optimal weights for the two extreme order statistics, and the third is the one which is based on k optimal weights. We calculated the relative precisions for each of these four distributions by using the WRSS procedure for each sample size. These relative precisions are much higher than the relative precisions of RSS estimator of mean. Furthermore, relative precisions of our estimators are higher than those which are based on Neyman’s optimal procedures for . The relative precisions of our estimator are even higher than Neyman’s procedure for k=5 for some distributions. Furthermore, we attempted to compute one set of weight(s) for each k for all the distributions and compared the relative precisions of our estimator with those of RSS and Neyman’s estimators. Although there is slight loss in the values of relative precisions, they are still higher than those of Neyman’s model for for all four distributions and either more than or very close to Neyman’s model for k=5. In general, as is expected, the relative precisions of our estimator based on all optimal weights are higher than the relative precisions of our estimator based on two and one optimal weight(s). The gain in relative precisions is however marginal.
We studied the performance of our proposed estimators for increasing skewness of a family of lognormal distributions. The relative precision of our estimator based on one optimal weight is higher than those of Neyman’s estimator when the shape parameter exceeds 1.9. The relative precisions of our estimator based on two and
k optimal weights is uniformly higher than those of Neyman’s estimator for all values of shape parameter considered in
Table 7. From
Figure 1, we see that with the increasing values of skewness, the rate of increase of relative precisions of our proposed estimators based on WRSS are more than that of estimator based on Neyman’s method.
Based on the numerical computations of relative precisions, we recommend our estimator based on WRSS procedures for estimator of population mean of skew distributions with heavy right tail for small values of set sizes.
Conflict of Interest
There is no conflict of interest.
References
- Balakrishnan, N. and Chen, W. S. Handbook of Tables of order statistics from lognormal distribution with applications, Springer, U. S. A. 1999.
- Bhoj, D.S. New parametric ranked set sampling. Journal of Applied Statistical Sciences 1997, 6, 275-289.
- Bhoj, D.S. Estimation of parameters of extreme value distributions with ranked set samplingCommunication of Statistics: Theory and Methods 1997, 26(3), 653-667. [CrossRef]
- Bhoj, D. S. Ranked set sampling with unequal samples. Biometrics 2001, 57(3), 957-962. [CrossRef]
- Bhoj, D.S. and Ahsanullah, M. Estimation of the parameters of the generalized geometric distribution using ranked set sampling, Biometrics 1996, 52, 685-694. [CrossRef]
- Bhoj, D.S. and Chandra, G. Simple unequal allocation procedure for ranked set sampling with skew distributions. Journal of Modern Applied Statistical Methods 2019, 18(2), eP2811. [CrossRef]
- Bhoj, D.S. and Kushary, D. Ranked set sampling with unequal samples for skew distributions. Journal of Statistical computations and simulations 2016, 86(4), 676-681. [CrossRef]
- Chandra, G., Bhoj, D.S. and Pandey, R. Simple unbalanced ranked set sampling for mean estimation of response variable of developmental programs, Journal of Modern Applied Statistical Methods 2018, 17, 28. [CrossRef]
- Dell, T.R. and Clutter, J.L. Ranked set sampling theory with order statistics background. Biometrics, 1972, 28, 545–555. [CrossRef]
- Harter, H.L. and Balakrishnan, N. CRC handbook of tables for the use of order statistics in estimation, CRC Press, Boca Raton, New York, 1996.
- Kaur, A., Patil, G. P. and Taillie, C. Unequal allocation models for ranked set sampling with skew distributions. Biometrics 1997, 53, 123-130. [CrossRef]
- Lam, K., Sinha, B.K. and Wu, Z. Estimation of parameters of the two-parameter exponential distribution using ranked set sample. Annals of the Institute Statistical Mathematics 1994, 46, 723-736. [CrossRef]
- McIntyre, G. A. A method for unbiased selective sampling using ranked sets. Australian Journal of Agricultural Research 1952, 3, 385-390. [CrossRef]
- Stokes, S. L. Parametric ranked set sampling. Annals of the Institute of Statistical Mathematics 1952, 47, 465-482. [CrossRef]
- Takahasi, K. and Wakimoto, K. On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics 1968, 20, 1-31.
- Tiwari, N. and Chandra, G. A systematic procedure for unequal allocation for skewed Distributions in Ranked Set Sampling. Journal of the Society of Agricultural Statistics 2011, 65(3), 331-338.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).