Skip to main content
Log in

Estimation in Complex Sampling Designs Based on Resampling Methods

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

Generally, to select a representative sample of the population, we use a combination of several probabilistic sampling methods which is called a complex sampling design. A complex sampling design usually needs very sophisticated mathematical calculations to provide unbiased estimators of the population parameters. Therefore, only a limited number of sampling designs are commonly used in practice. In the present study, to overcome this complexity, we propose a general method of estimation based on resampling that is suitable for all standard designs, either conventional or adaptive. In this method, we calculate Murthy estimator as an unbiased estimator for the population mean and its variance estimator without intensive mathematical calculations. Using this method, researchers can perform any probability design with the guarantee that the estimator is unbiased. To show this ability and as an application of the method, we introduce Adaptive Random Walk Sampling as a complex and efficient sampling design, proper for the quadrat-based environmental population. Despite the complexity of this design, the method proposed in this paper provides unbiased estimator for the population mean based on the design and then makes it a practical design. Simulations confirm the expected performance of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Brown, J.A. and Manly, B.J.F. (1998), Restricted adaptive cluster sampling. Environmental and Ecological Statistics, 5, 49-63.

    Article  Google Scholar 

  • Chao, C.T. and Thompson, S.K. (1999), Incomplete adaptive cluster sampling designs. In: Proceedings of the section on survey research methods of the American statistical association, 345–350.

  • Fattorini, L. (2006), Applying the Horvitz–Thompson criterion in complex designs: a computer-intensive perspective for estimating inclusion probabilities, Biometrika, 93(2), 269–278.

    Article  MathSciNet  Google Scholar 

  • Horvitz, D.G. and Thompson, D.J. (1952), A generalization of sampling without replacement from a finite universe, Journal of the American Statistical Association, 47, 663–685.

    Article  MathSciNet  Google Scholar 

  • Karr, A.F. (1993), Probability, Springer-Verlag, New York.

    Book  Google Scholar 

  • Kruskaland, W. and Mosteller, F. (1997a), Representative sampling, I: non-scientific literature, International Statistical Review, 47, 13–24.

    Article  Google Scholar 

  • — (1997b), Representative sampling, II: scientific literature, excluding statistics, International Statistical Review, 47, 111–127.

    Article  Google Scholar 

  • — (1997c), Representative sampling, III: the current statistical literature, International Statistical Review, 47, 245–265.

    Article  Google Scholar 

  • Murthy, M.N. (1957), Ordered and unordered estimators in sampling without replacement. Sankhya: Indian Journal of Statistics, 18, 379–390.

    MathSciNet  MATH  Google Scholar 

  • Narain, R. (1951), On sampling without replacement with varying probabilities. Journal of the Indian Society of Agricultural Statistics, 3, 169–175.

    MathSciNet  Google Scholar 

  • Panahbehagh, B. (2016) Adaptive rectangular sampling: an easy, incomplete, neighborhood-free adaptive cluster sampling design. Survey Methodology, 42(2), 263–281.

    Google Scholar 

  • Panahbehagh, B. and Brown, J. (2016), gap based inverse sampling. Communications in Statistics; Theory and Methods, https://doi.org/10.1080/03610926.2016.1217022.

  • Ross, S.M. (2006), A first course in probability. Upper Saddle River, N.J., Pearson Prentice Hall.

    MATH  Google Scholar 

  • Salehi, M.M. and Seber, G.A.F. (1997) Two-stage adaptive cluster sampling. Biometrics, 53, 959-970.

    Article  Google Scholar 

  • — (2001), A new proof of Murthy’s estimator with applies to sequential sampling, Australian & New Zealand Journal of Statistics, 43(3), 281–286.

    Article  MathSciNet  Google Scholar 

  • Salehi, M.M. and Smith, D.R. (2005), Two-stage sequential sampling: a neighborhood-free adaptive sampling procedure. Journal of Agricultural, Biological, and Environmental Statistics, 10, 84-103.

    Article  Google Scholar 

  • Sarndal, C.E., Swensson, B. and Wretman, J. (1992), Model assisted survey sampling. Springer series in statistics, Springer-Verlag Publishing.

  • Smith, D.R., Conroy, M.J. and Brakhage, H. (1995), Efficiency of adaptive cluster sampling for estimating density of wintering waterfowl. Biometrics, 51, 777–788.

    Article  Google Scholar 

  • Su, Z. and Quinn II, T.J. (2003), Estimator bias and efficiency for adaptive cluster sampling with order statistics and a stopping rule. Environmental and Ecological Statistics, 10, 17–41.

    Article  MathSciNet  Google Scholar 

  • Szwarcwald, C.L., Damacena G.N. (2008) Complex Sampling Design in Population Surveys: Planning and effects on statistical data analysis. Rev Bras Epidemiol,11, 38–45.

    Article  Google Scholar 

  • Thompson, S.K. (1990) Adaptive cluster sampling. Journal of American Statistical Association, 85, 1050-1059.

    Article  MathSciNet  Google Scholar 

  • Thompson, S.K. and Seber, G.A.F. (1996), Adaptive Sampling, Wiley, New York.

    MATH  Google Scholar 

  • Yang, H., Kleinn, C., Fehrmann, L., Tang, S. and Magnussen, S. (2011) A new design for sampling with adaptive sample plots. Environmental and Ecological Statistics, 8, 223–237.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bardia Panahbehagh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof 1

For \({\hat{p}}_s\) we have

$$\begin{aligned} {\hat{p}}_s=\frac{n(s)+1}{K+1};\;\;\; n(s)=\sum \limits _{k= 1}^KI_s(k);\;\;\; I_s(1),I_s(2),\ldots ,I_s(K)\sim ^{iid}\hbox {Bernoulli}(p_s) \end{aligned}$$

where \(I_s(k)=1\) if \(s_k=s\) and iid indicates the sample units are independent and identically distributed.

Then Based on Strong Law of Large Numbers and \(E(|I_s(k)|)<\infty \) (Karr 1993, pp. 188), as \(K\xrightarrow { }\infty \) we have

$$\begin{aligned} {\bar{I}}_s=\frac{n(s)}{K}\xrightarrow {a.s.}p_s \end{aligned}$$

and then

$$\begin{aligned} {\hat{p}}_s=\frac{n(s)+1}{K+1}=\frac{{\bar{I}}_s+1/K}{1+1/K}\xrightarrow {a.s.}p_s. \end{aligned}$$

proofs for \({\hat{p}}_{s,i},\; {\hat{p}}_i,\; {\hat{p}}_{s,i,j}\) and \({\hat{p}}_{i,j}\) are the same as the proof of \({\hat{p}}_s\).

Also

$$\begin{aligned} {\hat{\mu }}_{\text{ K }}\xrightarrow {a.s.}{\hat{\mu }}; \;\;\; {\hat{V}}_{1\text{ K }}\xrightarrow {a.s.}{\hat{V}}_1;\;\;\; {\hat{V}}_{2\text{ K }}\xrightarrow {a.s.}{\hat{V}}_2 \end{aligned}$$

are satisfied because of the continuity of their functions and a.s. convergence of their elements (for more details see Karr 1993 pp. 150). \(\square \)

Proof 2

First please \(E_d\) and \(E_r\) denote expectations according to design and resampling, respectively. In MBR it is easy to show that:

$$\begin{aligned} n(s,i)|n(s)\sim B(n(s),\frac{p_{s,i}}{p_s}). \end{aligned}$$

Also, if \(X\sim B(K,p)\),

$$\begin{aligned} E_r\left( \frac{X}{X+1}\right) =1-\frac{1-(1-p)^{(K+1)}}{(K+1)p}, \end{aligned}$$

then

$$\begin{aligned} E_r\left( \frac{n(s,i)}{n(s)+1}\right) = \frac{p_{s,i}}{p_s}-\left[ \frac{1-(1-p_s)^{K+1}}{(K+1)p_s}\frac{p_{s,i}}{p_s}\right] , \end{aligned}$$

and then

$$\begin{aligned} \frac{|E({\hat{\mu }}_{\text{ K }})-\mu |}{\mu }= & {} \left| \frac{1}{\mu }E_d\left( \frac{1-(1-p_s)^{K+1}}{(K+1)p_s}\frac{1}{N}\sum \limits _{i\in s}\frac{p_{s,i}}{p_sp_i}y_i\right) \right| \\\le & {} \max \limits _{s}{\frac{1-(1-p_s)^{K+1}}{(K+1)p_s}} \le {\frac{1}{(K+1)p_{s^*}}}. \end{aligned}$$

For variance, as

$$\begin{aligned} n(s,i,j)|n(s)\sim B(n(s),\frac{p_{s,i,j}}{p_s}), \end{aligned}$$

and since \((h+1)x(1-x)^{h}\le 1\) for any positive integer h and \(x\in [0,1]\), we have

$$\begin{aligned} E_r\left( \frac{n(s,i,j)}{n(s)+1}\right) -\frac{p_{s,i,j}}{p_s}= \left[ \frac{(1-p_s)^{K+1}-1}{(K+1)p_s}\frac{p_{s,i,j}}{p_s}\right] \le \frac{1}{(K+1)(K+2)p^2_s}\frac{p_{s,i,j}}{p_s}, \end{aligned}$$

and

$$\begin{aligned} \frac{p_{s,i,j}}{p_s}-E_r\left( \frac{n(s,i,j)}{n^*(s)}\right) = \left[ \frac{1-(1-p_s)^{K+1}}{(K+1)p_s}\frac{p_{s,i,j}}{p_s}\right] \le \frac{1}{(K+1)p_s}\frac{p_{s,i,j}}{p_s}, \end{aligned}$$

then

$$\begin{aligned} |E_r\left( \frac{n(s,i,j)}{n^*(s)}\right) -\frac{p_{s,i,j}}{p_s}|\le \frac{1}{(K+1)p^2_s}\frac{p_{s,i,j}}{p_s}. \end{aligned}$$

Now we have

$$\begin{aligned} \frac{|E({\hat{V}}_{1\text{ K }})-V_1|}{V_1}= & {} \frac{|E_d(E_r({\hat{V}}_{1\text{ K }})-{\hat{V}}_1)|}{V_1}\\\le & {} \frac{E_d\left( \frac{1}{N^2}\sum \limits _{i\in s}\sum \limits _{j<i\in s}\frac{1}{{p}_{i,j}}\left| E_r\left( \frac{{\hat{p}}_{s,i,j}}{{\hat{p}}_s}\right) -\frac{p_{s,i,j}}{p_s}\right| \left( \frac{y_i}{{p}_i}-\frac{y_j}{{p}_{j}}\right) ^2{p}_i{p}_{j}\right) }{V_1}\\\le & {} \frac{\frac{1}{(K+1)p^2_{s^*}}E_d\left( \frac{1}{N^2}\sum \limits _{i\in s}\sum \limits _{j<i\in s}\frac{p_{s,i,j}}{p_{i,j}p_s}\left( \frac{y_i}{{p}_i} -\frac{y_j}{{p}_{j}}\right) ^2{p}_i{p}_{j}\right) }{V_1}\\= & {} \frac{1}{(K+1)p^2_{s^*}}\frac{E_d({\hat{V}}_1)}{V_1}=\frac{1}{(K+1)p^2_{s^*}}, \end{aligned}$$

and for \(V_2\) we have

$$\begin{aligned} n(s,i)n(s,j)|n(s)\sim MB\left( n(s),\frac{p_{s,i}}{p_s},\frac{p_{s,j}}{p_s},1 -\frac{p_{s,i}}{p_s}-\frac{p_{s,j}}{p_s}\right) , \end{aligned}$$

where MB denotes the Multinomial distribution. Then

$$\begin{aligned} E_r\left( \frac{n(s,i)n(s,j)}{(n(s)+1)^2}\right)= & {} E_r\left[ E_r\left( \frac{n(s,i)}{n(s)+1}\frac{n(s,j)}{n(s)+1}|n(s)\right) \right] \\= & {} E_r\left[ E_r\left( \frac{n(s,i)}{n(s)+1}|n(s)\right) E_r\left( \frac{n(s,j)}{n(s)+1}|n(s)\right) \right. \nonumber \\&\left. + Cov_r\left( \frac{n(s,i)}{n(s)+1}\frac{n(s,j)}{n(s)+1}|n(s)\right) \right] \\&\quad \frac{p_{s,i}p_{s,j}}{p^2_s}E_r\left[ \frac{n(s)^2}{(n(s)+1)^2} -\frac{n(s)}{(n(s)+1)^2}\right] \\\le & {} \frac{p_{s,i}p_{s,j}}{p_s^2} E_r\left[ \frac{n(s)^2}{(n(s)+1)^2}\right] \le \frac{p_{s,i}p_{s,j}}{p_s^2}E_r\left[ \frac{n(s)}{n(s)+1}\right] \\= & {} \frac{p_{s,i}p_{s,j}}{p_s^2}\left( 1+\frac{(1-p_s)^{K+1}-1}{(K+1)p_s}\right) , \end{aligned}$$

and therefore

$$\begin{aligned} E_r\left( \frac{n(s,i)n(s,j)}{(n(s)+1)^2}\right) -\frac{p_{s,i}p_{s,j}}{p_s^2}\le & {} \frac{p_{s,i}p_{s,j}}{p_s^2}\left( \frac{(1-p_s)^{K+1}-1}{(K+1)p_s}\right) \\\le & {} \frac{p_{s,i}p_{s,j}}{p_s^2}\left( \frac{1}{(K+1)(K+2)p_s^2}-\frac{1}{(K+1)p_s}\right) \\\le & {} \frac{p_{s,i}p_{s,j}}{p_s^2}\left( \frac{1}{(K+1)(K+2)p_s^2}\right) \\\le & {} \frac{p_{s,i}p_{s,j}}{p_s^2}\left( \frac{1}{(K+1)p_s^2}\right) . \end{aligned}$$

Now as

$$\begin{aligned}&E_r\left( \frac{n(s,i)n(s,j)}{(n(s)+1)^2}\right) \\&\quad = \frac{p_{s,i}p_{s,j}}{p_s^2} E_r\left[ \frac{n(s)(n(s)-1)}{(n(s)+1)^2}\right] \ge \frac{p_{s,i}p_{s,j}}{p_s^2} E_r\left[ \frac{(n(s)-1)^2}{(n(s)+1)^2}\right] \\&\quad \ge \frac{p_{s,i}p_{s,j}}{p_s^2} E_r\left[ \frac{(n(s)-1)^2}{(n(s)+1)^2}\right] \ge \frac{p_{s,i}p_{s,j}}{p_s^2} E_r\left[ \frac{(n(s)-1)^2}{(n(s)+1)(n(s)+2)}\right] \\&\quad =\frac{p_{s,i}p_{s,j}}{p_s^2}\\&\qquad \times \frac{-9(1-p_s)^{K+2}-4(K+2)p_s(1-p_s)^{K+1} +(K+2)p_s(1-p_s)+((K+2)p_s-3)^2}{(K+1)(K+2)p_s^2}, \end{aligned}$$

therefore

$$\begin{aligned}&\frac{p_{s,i}p_{s,j}}{p_s^2}-E_r\left( \frac{n(s,i)n(s,j)}{(n(s)+1)^2}\right) \le \frac{p_{s,i}p_{s,j}}{p_s^2}\\&\qquad \times \left( 1-\frac{-9(1-p_s)^{K+2}-4(K+2)p_s(1-p_s)^{K+1}+(K+2)p_s(1-p_s) +((K+2)p_s-3)^2}{(K+1)(K+2)p_s^2}\right) \\&\quad \le \frac{p_{s,i}p_{s,j}}{p_s^2}\left( 1+\frac{9}{(K+1)(K+2)p_s^2} +\frac{4}{(K+2)(K+1)p^2_s}-\frac{K+2}{(K+1)}\right. \\&\qquad \left. -\frac{9}{(K+1)(K+2)p_s^2}+\frac{6}{(K+1)p_s}-\frac{1-p_s}{(K+1)p_s}\right) \\&\quad =\frac{p_{s,i}p_{s,j}}{p_s^2}\left( \frac{4}{(K+2)(K+1)p^2_s} -\frac{1}{(K+1)}+\frac{6}{(K+1)p_s}\right) \le \frac{p_{s,i}p_{s,j}}{p_s^2}\left( \frac{10}{(K+1)p^2_s}\right) , \end{aligned}$$

and then

$$\begin{aligned} \left| E_r\left( \frac{n(s,i)n(s,j)}{(n(s)+1)^2}\right) -\frac{p_{s,i}p_{s,j}}{p_s^2}\right| \le \frac{10}{(K+1)p_s^2}, \end{aligned}$$

and similar to \(V_1\) we have

$$\begin{aligned} \frac{|E({\hat{V}}_{2\text{ K }})-V_2|}{V_2}= & {} \frac{|E_d(E_r({\hat{V}}_{2\text{ K }})-{\hat{V}}_2)|}{V_2}\\\le & {} \frac{E_d\left( \frac{1}{N^2}\sum \limits _{i\in s}\sum \limits _{j<i\in s}\left| E_r\left( \frac{{\hat{p}}_{s,i}{\hat{p}}_{s,j}}{{\hat{p}}^2_s}\right) -\frac{p_{s,i}p_{s,j}}{p^2_s}\right| \left( \frac{y_i}{{p}_i}-\frac{y_j}{{p}_{j}}\right) ^2\right) }{V_2}\\\le & {} \frac{\frac{10}{(K+1)p^2_{s^*}}E_d\left( \frac{1}{N^2}\sum \limits _{i\in s}\sum \limits _{j<i\in s}\frac{p_{s,i}p_{s,j}}{p^2_s}\left( \frac{y_i}{{p}_i}-\frac{y_j}{{p}_{j}}\right) ^2\right) }{V_2}\\= & {} \frac{10}{(K+1)p^2_{s^*}}\frac{E_d({\hat{V}}_2)}{V_2}=\frac{10}{(K+1)p^2_{s^*}}. \end{aligned}$$

\(\square \)

Proof 3

According to the condition (3) of Theorem 3

$$\begin{aligned} E_r\left( \frac{n(s,i)}{n(s)}\right) =\frac{p_{s,i}}{p_s} \end{aligned}$$

and then

$$\begin{aligned} E({\hat{\mu }}_{\text{ K }})=E_d\left( \frac{1}{N}\sum \limits _{i\in s}\frac{p_{s,i}}{p_sp_i}y_i\right) =\mu . \end{aligned}$$

For variance, as

$$\begin{aligned} E_r\left( \frac{n(s,i,j)}{n(s)}\right) =\frac{p_{s,i,j}}{p_s} \end{aligned}$$

then

$$\begin{aligned} E({\hat{V}}_{1\text{ K }})=V_1 \end{aligned}$$

and for \(V_2\) we have

$$\begin{aligned} E_r\left( \frac{n(s,i)n(s,j)}{n(s)^2}\right) =\frac{p_{s,i}p_{s,j}}{p_s^2} \left( 1-\frac{1}{K}\right) \end{aligned}$$

then we have

$$\begin{aligned} \frac{|E({\hat{V}}_{2\text{ K }})-V_2|}{V_2}= & {} \frac{|E_d\left( E_r({\hat{V}}_{2\text{ K }})-{\hat{V}}_2\right) |}{V_2}\\= & {} \frac{E_d\left( \frac{1}{N^2}\sum \limits _{i\in s} \sum \limits _{j<i\in s}\left| E_r\left( \frac{{\hat{p}}_{s,i}{\hat{p}}_{s,j}}{{\hat{p}}^2_s}\right) -\frac{p_{s,i}p_{s,j}}{p^2_s}\right| \left( \frac{y_i}{{p}_i}-\frac{y_j}{{p}_{j}}\right) ^2\right) }{V_2} \\= & {} \frac{\frac{1}{K}E_d\left( \frac{1}{N^2}\sum \limits _{i\in s} \sum \limits _{j<i\in s}\frac{p_{s,i}p_{s,j}}{p^2_s}\left( \frac{y_i}{{p}_i} -\frac{y_j}{{p}_{j}}\right) ^2\right) }{V_2}=\frac{1}{K}. \end{aligned}$$

\(\square \)

Proof 4

Consider s as the result of a standard sampling design with equally likely sample space \(L=\{s(1),s(2),\ldots ,s(M)\}\). Then

$$\begin{aligned} p_s=\frac{N_L(s)}{M};\;\;\;p_{s,i}=\frac{N_L(s,i)}{M}, \end{aligned}$$

where \(N_L(s)\) and \(N_L(s,i)\) are the number of outcomes in L that lead to s and s with i as the first unit, respectively. Therefore for \(p_{i|s}\) we have

$$\begin{aligned} p_{i|s}=\frac{p_{s,i}}{p_s}=\frac{N_L(s,i)}{N_L(s)}. \end{aligned}$$

Now executing the design on s will lead to

$$\begin{aligned} L^*=\{s^*(1),s^*(2),\ldots ,s^*(M^*)\}, \end{aligned}$$

and as the design is a standard (\(P(s|{\mathbf {y}})\) is not dependent on y values of \(U-s\)) with equally likely sample space, then \(N_L(s)=N_{L^*}(s)\) and \(N_L(s,i)=N_{L^*}(s,i)\). Therefore

$$\begin{aligned} p^*_s=\frac{N_L^*(s)}{M^*}=\frac{N_L(s)}{M^*};\;\;\;p^*_{s,i}=\frac{N_L^*(s,i)}{M^*}=\frac{N_L(s,i)}{M^*}, \end{aligned}$$

and then

$$\begin{aligned} p^*_{i|s}=\frac{p^*_{s,i}}{p^*_s}=\frac{N_L(s,i)}{N_L(s)}=p_{i|s}. \end{aligned}$$

The same holds for \(p_{i,j|s}\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panahbehagh, B. Estimation in Complex Sampling Designs Based on Resampling Methods. JABES 25, 206–228 (2020). https://doi.org/10.1007/s13253-020-00390-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-020-00390-7

Keywords

Navigation