Quasi-random ranked set sampling

https://doi.org/10.1016/j.spl.2020.109029Get rights and content

Abstract

This article presents a quasi-random ranked set sampling approach based on the Halton sequence for natural resource surveys. A design-based variance estimator for the sample mean is also presented.

Introduction

To estimate a population characteristic of a natural resource, a sample consisting of some part of the population is used. The main goal of a sampling design is to select a representative sample from which the population characteristic can be estimated with low bias and high precision. This article considers drawing sample locations from two-dimensional continuous study regions to estimate a population mean μ. One of the most commonly used designs is simple random sampling (SRS), where the sample locations are independently drawn from a uniform distribution over the resource. Although SRS yields unbiased estimates of μ, there is no guarantee that a specific sample is representative. Many sampling designs have been proposed in the statistical literature that improve on SRS in a variety of ways (Hankin et al., 2019). One effective strategy is to spread the sample locations evenly over the resource, called spatially balanced sampling (Stevens and Olsen, 2004). These designs are known to be efficient when sampling natural resources because nearby locations tend to have more similar response values than distant locations (Stevens and Olsen, 2004, Robertson et al., 2013, Grafström and Schelin, 2014). In this article, we consider incorporating spatial spread into a two-phase sampling design, ranked set sampling.

Ranked set sampling (RSS) was first proposed by McIntyre (1952) for estimating the average yield of an arable crop in an agricultural field trial. Measuring yield for each field plot was time consuming because it involved removing, and then weighing, the crop. With some experience, it was relatively quick to estimate, by eye, the yield to the extent that a group of plots could be ranked in estimated yield order. McIntyre proposed the design, later named RSS (Halls and Dell, 1966), where a subset of plots is first ranked. The ranking variable can be a quick estimate of the response variable, a visual comparison, an expert opinion, or some other variable known to be correlated with the response variable, but it need not involve actual measurements of the response variable. Then, based on these rankings, a sample is drawn from the ranked plots. The goal of RSS is to collect observations from the resource that are likely to span the full range of response values in the population. This approach to data collection has spawned an active field of research and many RSS approaches have been proposed (Wolfe, 2010, Wolfe, 2012).

In practice, randomness is introduced into sampling designs using pseudo-random sequences which are irregular, non-repetitive and designed to mimic true random sequences. This article considers replacing the pseudo-random sequence in RSS with a quasi-random sequence to increase the spatial spread of the ranking variable. These sequences have been used as a substitute for random numbers in many fields, including numerical integration (Niederreiter, 1978, Niederreiter, 2003), optimization (Sobol, 1979, Robertson et al., 2014) and sampling (Robertson et al., 2013, Robertson et al., 2017).

A d-dimensional quasi-random sequence H={xj}j=1n[0,1)d is a low-discrepancy sequence with the property that for all values of n, the sequence has low discrepancy. The discrepancy of H is (Niederreiter, 1978) Dn(H)=supBJ|AH(B)nλ(B)|,where λ is the Lebesgue measure, AH(B) is the number of points from H in B and J is the set of boxes of the form {x[0,1)d:aixi<bi} with 0ai<bi<1. Loosely speaking, a sequence is considered low-discrepancy if the fraction of points in BJ is proportional to λ(B).

A number of quasi-random sequences have been proposed (Halton, 1960, Sobol, 1976, Faure, 1982), but this article focuses on the random-start Halton sequence (Wang and Hickernell, 2000) because of its simplicity. The ith coordinate of the jth point in a random-start Halton sequence H={xj}j=1[0,1)d is (Price and Price, 2012, Robertson et al., 2017) xj(i)=p=0ui+j1bip mod bi1bip+1,where b1,,bd are pair-wise co-prime bases, is the floor function and u1,,ud are independently generated integers. To find ui, pick vi[0,1] randomly and round vi to r digits in base bi, giving vi=0.d1dr (base bi), where r=max{q:t=1qdtbit11015}. Radix inversion gives ui=drd1 (base bi). For example, the first two points in H[0,1)2 with b1=2, b2=3,u1=2 and u2=5 are {x1,x2}={(14,79),(34,29)} (see supplementary material for calculations and an R function). This corresponds to the Halton sequence skipping two points in the first dimension and five points in the second dimension. Each xjH is a random point with uniform distribution on [0,1)d (Wang and Hickernell, 2000).

Taking the first n points from H that fall within a study region Ω[0,1)2 is called balanced acceptance sampling (BAS) (Robertson et al., 2013), and its modification (Robertson et al., 2017) requires x1Ω. BAS is a spatially balanced sampling design that spreads sample locations evenly over Ω. BAS is efficient when sampling natural resources because well-spread sample locations are likely to span the full range of response values due to the locally similar property of natural resources (Stevens and Olsen, 2004). Rather than having spatially balanced response values, this article considers a spatially balanced ranking variable in RSS, called quasi-random RSS.

The rest of this article is organized as follows. In Section 2, RSS is explained and a quasi-random approach is presented in Section 3. Both approaches are numerically tested in Section 4 and concluding remarks are given in Section 5.

Section snippets

Drawing a ranked set sample

Consider drawing a balanced RSS of n=km points from Ω[0,1)2 with λ(Ω)>0. The method described below is called balanced RSS because one judgment order statistic is collected for each of the k ranks.

  • 1.

    Draw an SRS of k points from Ω and rank order the points using a measured ranking variable. Include X[1] in the sample, the point with the lowest judgment ranking.

  • 2.

    Repeat step (1), but now the point with the second lowest judgment ranking, X[2], is included in the sample.

  • 3.

    Repeat step (1) using the

Quasi-random ranked set sampling

To draw a quasi-random RSS (QRSS) of n=km points from Ω[0,1)2 with λ(Ω)>0, a random-start Halton sequence H with x1Ω is generated. We choose b1 and b2 in (1) as the two smallest prime numbers that are co-prime with k (their greatest common divisor is one) to remove undesirable relationships between k and cyclical properties of the Halton sequence (Robertson et al., 2017, Robertson et al., 2018). Let HΩ={xj}j=1k2m denote the first k2m points from H in Ω — a BAS sample of size k2m. Define the

Numerical results and discussion

In this section, we investigate the precision of QRSS and the effectiveness of var̂(μ̂QRSS) using several functions with different spatial structure from Robertson et al. (2018), where f:[0,1)2R and μ=[0,1)2f(x)dx. For each xi[0,1)2, we defined the measured response as yi=f(xi) and the measured ranking variable as zi=f(xi)+ϵ, where ϵ is a normally distributed error term. Other than for set ranking purposes, the magnitude of zi was not utilized. These functions are illustrated in Fig. 2 and

Conclusion

In this article we introduced quasi-random ranked set sampling (QRSS) for natural resources, where the random-start Halton sequence was used to draw a ranked set sample. The Halton sequence ensured the sample locations of a measured ranking variable were evenly spread over the resource. Numerical results showed that QRSS with k=3 was more precise than ranked set sampling with k=20. This makes QRSS particularly useful in practice because a k=3 design requires far fewer ranking measurements and

CRediT authorship contribution statement

B.L. Robertson: Conceptualization, Methodology, Software, Writing - original draft. M. Reale: Conceptualization, Writing - review & editing. C.J. Price: Methodology, Writing - review & editing. J.A. Brown: Conceptualization, Writing - review & editing.

Acknowledgments

We thank two anonymous referees and the editor for valuable comments that led to an improved article.

References (21)

  • ModarresR. et al.

    Resampling methods for ranked set samples

    Comput. Statist. Data Anal.

    (2006)
  • NiederreiterH.

    Error bounds for quasi-Monte Carlo integration with uniform point sets

    J. Comput. Appl. Math.

    (2003)
  • RobertsonB.L. et al.

    A modification of balanced acceptance sampling

    Statist. Probab. Lett.

    (2017)
  • WangX. et al.

    Randomized Halton sequences

    Math. Comput. Modelling

    (2000)
  • DellT.R. et al.

    Ranked set sampling theory with order statistics background

    Biometrics

    (1972)
  • FaureH.

    Discrepance de suites associees a un systeme de numeration (en dimension s)

    Acta Arith.

    (1982)
  • GrafströmA. et al.

    How to select representative samples

    Scand. J. Stat.

    (2014)
  • HallsL.K. et al.

    Trial of ranked-set sampling for forage yields

    Forest Sci.

    (1966)
  • HaltonJ.

    On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals

    Numer. Math.

    (1960)
  • HankinD.G. et al.

    Sampling Theory for the Ecological and Natural Resource Sciences

    (2019)
There are more references available in the full text version of this article.

Cited by (5)

View full text