Quasi-random ranked set sampling

doi:10.1016/j.spl.2020.109029

Statistics & Probability Letters

Volume 171, April 2021, 109029

https://doi.org/10.1016/j.spl.2020.109029 Get rights and content

Abstract

This article presents a quasi-random ranked set sampling approach based on the Halton sequence for natural resource surveys. A design-based variance estimator for the sample mean is also presented.

Introduction

To estimate a population characteristic of a natural resource, a sample consisting of some part of the population is used. The main goal of a sampling design is to select a representative sample from which the population characteristic can be estimated with low bias and high precision. This article considers drawing sample locations from two-dimensional continuous study regions to estimate a population mean $μ$ . One of the most commonly used designs is simple random sampling (SRS), where the sample locations are independently drawn from a uniform distribution over the resource. Although SRS yields unbiased estimates of $μ$ , there is no guarantee that a specific sample is representative. Many sampling designs have been proposed in the statistical literature that improve on SRS in a variety of ways (Hankin et al., 2019). One effective strategy is to spread the sample locations evenly over the resource, called spatially balanced sampling (Stevens and Olsen, 2004). These designs are known to be efficient when sampling natural resources because nearby locations tend to have more similar response values than distant locations (Stevens and Olsen, 2004, Robertson et al., 2013, Grafström and Schelin, 2014). In this article, we consider incorporating spatial spread into a two-phase sampling design, ranked set sampling.

Ranked set sampling (RSS) was first proposed by McIntyre (1952) for estimating the average yield of an arable crop in an agricultural field trial. Measuring yield for each field plot was time consuming because it involved removing, and then weighing, the crop. With some experience, it was relatively quick to estimate, by eye, the yield to the extent that a group of plots could be ranked in estimated yield order. McIntyre proposed the design, later named RSS (Halls and Dell, 1966), where a subset of plots is first ranked. The ranking variable can be a quick estimate of the response variable, a visual comparison, an expert opinion, or some other variable known to be correlated with the response variable, but it need not involve actual measurements of the response variable. Then, based on these rankings, a sample is drawn from the ranked plots. The goal of RSS is to collect observations from the resource that are likely to span the full range of response values in the population. This approach to data collection has spawned an active field of research and many RSS approaches have been proposed (Wolfe, 2010, Wolfe, 2012).

In practice, randomness is introduced into sampling designs using pseudo-random sequences which are irregular, non-repetitive and designed to mimic true random sequences. This article considers replacing the pseudo-random sequence in RSS with a quasi-random sequence to increase the spatial spread of the ranking variable. These sequences have been used as a substitute for random numbers in many fields, including numerical integration (Niederreiter, 1978, Niederreiter, 2003), optimization (Sobol, 1979, Robertson et al., 2014) and sampling (Robertson et al., 2013, Robertson et al., 2017).

A $d$ -dimensional quasi-random sequence $H = {x_{j}}_{j = 1}^{n} \subset {[0, 1)}^{d}$ is a low-discrepancy sequence with the property that for all values of $n$ , the sequence has low discrepancy. The discrepancy of $H$ is (Niederreiter, 1978) $D_{n} (H) = sup_{B \in J} | \frac{A_{H} (B)}{n} - λ (B) |,$ where $λ$ is the Lebesgue measure, $A_{H} (B)$ is the number of points from $H$ in $B$ and $J$ is the set of boxes of the form ${x \in {[0, 1)}^{d} : a_{i} \leq x_{i} < b_{i}}$ with $0 \leq a_{i} < b_{i} < 1$ . Loosely speaking, a sequence is considered low-discrepancy if the fraction of points in $B \in J$ is proportional to $λ (B)$ .

A number of quasi-random sequences have been proposed (Halton, 1960, Sobol, 1976, Faure, 1982), but this article focuses on the random-start Halton sequence (Wang and Hickernell, 2000) because of its simplicity. The $i$ th coordinate of the $j$ th point in a random-start Halton sequence $H = {x_{j}}_{j = 1}^{\infty} \subset {[0, 1)}^{d}$ is (Price and Price, 2012, Robertson et al., 2017) $x_{j}^{(i)} = \sum_{p = 0}^{\infty} \{⌊\frac{u_{i} + j - 1}{b_{i}^{p}}⌋ mod b_{i}\} \frac{1}{b_{i}^{p + 1}},$ where $b_{1}, \dots, b_{d}$ are pair-wise co-prime bases, $⌊ \cdot ⌋$ is the floor function and $u_{1}, \dots, u_{d}$ are independently generated integers. To find $u_{i}$ , pick $v_{i} \in [0, 1]$ randomly and round $v_{i}$ to $r$ digits in base $b_{i}$ , giving $v_{i} = 0 . d_{1} \dots d_{r}$ (base $b_{i}$ ), where $r = max {q : \sum_{t = 1}^{q} d_{t} b_{i}^{t - 1} \leq 1 0^{15}}$ . Radix inversion gives $u_{i} = d_{r} \dots d_{1}$ (base $b_{i}$ ). For example, the first two points in $H \subset {[0, 1)}^{2}$ with $b_{1} = 2$ , $b_{2} = 3, u_{1} = 2$ and $u_{2} = 5$ are ${x_{1}, x_{2}} = {(1 ∕ 4, 7 ∕ 9), (3 ∕ 4, 2 ∕ 9)}$ (see supplementary material for calculations and an R function). This corresponds to the Halton sequence skipping two points in the first dimension and five points in the second dimension. Each $x_{j} \in H$ is a random point with uniform distribution on ${[0, 1)}^{d}$ (Wang and Hickernell, 2000).

Taking the first $n$ points from $H$ that fall within a study region $Ω \subset {[0, 1)}^{2}$ is called balanced acceptance sampling (BAS) (Robertson et al., 2013), and its modification (Robertson et al., 2017) requires $x_{1} \in Ω$ . BAS is a spatially balanced sampling design that spreads sample locations evenly over $Ω$ . BAS is efficient when sampling natural resources because well-spread sample locations are likely to span the full range of response values due to the locally similar property of natural resources (Stevens and Olsen, 2004). Rather than having spatially balanced response values, this article considers a spatially balanced ranking variable in RSS, called quasi-random RSS.

The rest of this article is organized as follows. In Section 2, RSS is explained and a quasi-random approach is presented in Section 3. Both approaches are numerically tested in Section 4 and concluding remarks are given in Section 5.

Section snippets

Drawing a ranked set sample

Consider drawing a balanced RSS of $n = k m$ points from $Ω \subset {[0, 1)}^{2}$ with $λ (Ω) > 0$ . The method described below is called balanced RSS because one judgment order statistic is collected for each of the $k$ ranks.

1.
Draw an SRS of $k$ points from $Ω$ and rank order the points using a measured ranking variable. Include $X_{[1]}$ in the sample, the point with the lowest judgment ranking.
2.
Repeat step (1), but now the point with the second lowest judgment ranking, $X_{[2]}$ , is included in the sample.
3.
Repeat step (1) using the

Quasi-random ranked set sampling

To draw a quasi-random RSS (QRSS) of $n = k m$ points from $Ω \subset {[0, 1)}^{2}$ with $λ (Ω) > 0$ , a random-start Halton sequence $H$ with $x_{1} \in Ω$ is generated. We choose $b_{1}$ and $b_{2}$ in (1) as the two smallest prime numbers that are co-prime with $k$ (their greatest common divisor is one) to remove undesirable relationships between $k$ and cyclical properties of the Halton sequence (Robertson et al., 2017, Robertson et al., 2018). Let $H_{Ω} = {x_{j}}_{j = 1}^{k^{2} m}$ denote the first $k^{2} m$ points from $H$ in $Ω$ — a BAS sample of size $k^{2} m$ . Define the

Numerical results and discussion

In this section, we investigate the precision of QRSS and the effectiveness of $\hat{var} ({\hat{μ}}_{QRSS})$ using several functions with different spatial structure from Robertson et al. (2018), where $f : {[0, 1)}^{2} \to R$ and $μ = \int_{{[0, 1)}^{2}} f (x) d x$ . For each $x_{i} \in {[0, 1)}^{2}$ , we defined the measured response as $y_{i} = f (x_{i})$ and the measured ranking variable as $z_{i} = f (x_{i}) + ϵ$ , where $ϵ$ is a normally distributed error term. Other than for set ranking purposes, the magnitude of $z_{i}$ was not utilized. These functions are illustrated in Fig. 2 and

Conclusion

In this article we introduced quasi-random ranked set sampling (QRSS) for natural resources, where the random-start Halton sequence was used to draw a ranked set sample. The Halton sequence ensured the sample locations of a measured ranking variable were evenly spread over the resource. Numerical results showed that QRSS with $k = 3$ was more precise than ranked set sampling with $k = 20$ . This makes QRSS particularly useful in practice because a $k = 3$ design requires far fewer ranking measurements and

CRediT authorship contribution statement

B.L. Robertson: Conceptualization, Methodology, Software, Writing - original draft. M. Reale: Conceptualization, Writing - review & editing. C.J. Price: Methodology, Writing - review & editing. J.A. Brown: Conceptualization, Writing - review & editing.

Acknowledgments

We thank two anonymous referees and the editor for valuable comments that led to an improved article.

References (21)

ModarresR. et al.
Resampling methods for ranked set samples
Comput. Statist. Data Anal.
(2006)
NiederreiterH.
Error bounds for quasi-Monte Carlo integration with uniform point sets
J. Comput. Appl. Math.
(2003)
RobertsonB.L. et al.
A modification of balanced acceptance sampling
Statist. Probab. Lett.
(2017)
WangX. et al.
Randomized Halton sequences
Math. Comput. Modelling
(2000)
DellT.R. et al.
Ranked set sampling theory with order statistics background
Biometrics
(1972)
FaureH.
Discrepance de suites associees a un systeme de numeration (en dimension s)
Acta Arith.
(1982)
GrafströmA. et al.
How to select representative samples
Scand. J. Stat.
(2014)
HallsL.K. et al.
Trial of ranked-set sampling for forage yields
Forest Sci.
(1966)
HaltonJ.
On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals
Numer. Math.
(1960)
HankinD.G. et al.
Sampling Theory for the Ecological and Natural Resource Sciences
(2019)

There are more references available in the full text version of this article.

Cited by (5)

One point per cluster spatially balanced sampling
2024, Computational Statistics and Data Analysis
A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with relatively high precision. Spatially balanced designs have good spatial spread and give precise results for commonly used estimators when surveying natural resources. A new design is proposed which draws spatially balanced samples from stratified and unstratified populations. The method is two-fold. First, the population is partitioned into n compact geographic clusters. Then, a one point per cluster sample is drawn using a linear assignment strategy that optimises the spatial spread of the sample. Numerical results on several simulated populations show that the method generates well-spread samples and compares favourably with existing designs. An example application is also provided, where soil organic matter concentrations are estimated over a study area in Voorst, Netherlands.
A review on concomitants of order statistics and its application in parameter estimation under ranked set sampling
2024, Journal of the Korean Statistical Society
Trade-off between efficiency and variance estimation of spatially balanced augmented samples
2023, Environmental and Ecological Statistics
New modification of ranked set sampling for estimating population mean
2023, Journal of Statistical Computation and Simulation
Spatially Balanced Sampling with Local Ranking
2022, Journal of Agricultural, Biological, and Environmental Statistics

View full text

Quasi-random ranked set sampling

Abstract

Introduction

Section snippets

Drawing a ranked set sample

Quasi-random ranked set sampling

Numerical results and discussion

Conclusion

CRediT authorship contribution statement

Acknowledgments

Comput. Statist. Data Anal.

J. Comput. Appl. Math.

Statist. Probab. Lett.

Math. Comput. Modelling

Ranked set sampling theory with order statistics background

Biometrics

Discrepance de suites associees a un systeme de numeration (en dimension s)

Acta Arith.

How to select representative samples

Scand. J. Stat.

Trial of ranked-set sampling for forage yields

Forest Sci.

On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals

Numer. Math.

Sampling Theory for the Ecological and Natural Resource Sciences