Quasi-random ranked set sampling
Introduction
To estimate a population characteristic of a natural resource, a sample consisting of some part of the population is used. The main goal of a sampling design is to select a representative sample from which the population characteristic can be estimated with low bias and high precision. This article considers drawing sample locations from two-dimensional continuous study regions to estimate a population mean . One of the most commonly used designs is simple random sampling (SRS), where the sample locations are independently drawn from a uniform distribution over the resource. Although SRS yields unbiased estimates of , there is no guarantee that a specific sample is representative. Many sampling designs have been proposed in the statistical literature that improve on SRS in a variety of ways (Hankin et al., 2019). One effective strategy is to spread the sample locations evenly over the resource, called spatially balanced sampling (Stevens and Olsen, 2004). These designs are known to be efficient when sampling natural resources because nearby locations tend to have more similar response values than distant locations (Stevens and Olsen, 2004, Robertson et al., 2013, Grafström and Schelin, 2014). In this article, we consider incorporating spatial spread into a two-phase sampling design, ranked set sampling.
Ranked set sampling (RSS) was first proposed by McIntyre (1952) for estimating the average yield of an arable crop in an agricultural field trial. Measuring yield for each field plot was time consuming because it involved removing, and then weighing, the crop. With some experience, it was relatively quick to estimate, by eye, the yield to the extent that a group of plots could be ranked in estimated yield order. McIntyre proposed the design, later named RSS (Halls and Dell, 1966), where a subset of plots is first ranked. The ranking variable can be a quick estimate of the response variable, a visual comparison, an expert opinion, or some other variable known to be correlated with the response variable, but it need not involve actual measurements of the response variable. Then, based on these rankings, a sample is drawn from the ranked plots. The goal of RSS is to collect observations from the resource that are likely to span the full range of response values in the population. This approach to data collection has spawned an active field of research and many RSS approaches have been proposed (Wolfe, 2010, Wolfe, 2012).
In practice, randomness is introduced into sampling designs using pseudo-random sequences which are irregular, non-repetitive and designed to mimic true random sequences. This article considers replacing the pseudo-random sequence in RSS with a quasi-random sequence to increase the spatial spread of the ranking variable. These sequences have been used as a substitute for random numbers in many fields, including numerical integration (Niederreiter, 1978, Niederreiter, 2003), optimization (Sobol, 1979, Robertson et al., 2014) and sampling (Robertson et al., 2013, Robertson et al., 2017).
A -dimensional quasi-random sequence is a low-discrepancy sequence with the property that for all values of , the sequence has low discrepancy. The discrepancy of is (Niederreiter, 1978) where is the Lebesgue measure, is the number of points from in and is the set of boxes of the form with . Loosely speaking, a sequence is considered low-discrepancy if the fraction of points in is proportional to .
A number of quasi-random sequences have been proposed (Halton, 1960, Sobol, 1976, Faure, 1982), but this article focuses on the random-start Halton sequence (Wang and Hickernell, 2000) because of its simplicity. The th coordinate of the th point in a random-start Halton sequence is (Price and Price, 2012, Robertson et al., 2017) where are pair-wise co-prime bases, is the floor function and are independently generated integers. To find , pick randomly and round to digits in base , giving (base ), where . Radix inversion gives (base ). For example, the first two points in with , and are (see supplementary material for calculations and an R function). This corresponds to the Halton sequence skipping two points in the first dimension and five points in the second dimension. Each is a random point with uniform distribution on (Wang and Hickernell, 2000).
Taking the first points from that fall within a study region is called balanced acceptance sampling (BAS) (Robertson et al., 2013), and its modification (Robertson et al., 2017) requires . BAS is a spatially balanced sampling design that spreads sample locations evenly over . BAS is efficient when sampling natural resources because well-spread sample locations are likely to span the full range of response values due to the locally similar property of natural resources (Stevens and Olsen, 2004). Rather than having spatially balanced response values, this article considers a spatially balanced ranking variable in RSS, called quasi-random RSS.
The rest of this article is organized as follows. In Section 2, RSS is explained and a quasi-random approach is presented in Section 3. Both approaches are numerically tested in Section 4 and concluding remarks are given in Section 5.
Section snippets
Drawing a ranked set sample
Consider drawing a balanced RSS of points from with . The method described below is called balanced RSS because one judgment order statistic is collected for each of the ranks.
- 1.
Draw an SRS of points from and rank order the points using a measured ranking variable. Include in the sample, the point with the lowest judgment ranking.
- 2.
Repeat step (1), but now the point with the second lowest judgment ranking, , is included in the sample.
- 3.
Repeat step (1) using the
Quasi-random ranked set sampling
To draw a quasi-random RSS (QRSS) of points from with , a random-start Halton sequence with is generated. We choose and in (1) as the two smallest prime numbers that are co-prime with (their greatest common divisor is one) to remove undesirable relationships between and cyclical properties of the Halton sequence (Robertson et al., 2017, Robertson et al., 2018). Let denote the first points from in — a BAS sample of size . Define the
Numerical results and discussion
In this section, we investigate the precision of QRSS and the effectiveness of using several functions with different spatial structure from Robertson et al. (2018), where and . For each , we defined the measured response as and the measured ranking variable as , where is a normally distributed error term. Other than for set ranking purposes, the magnitude of was not utilized. These functions are illustrated in Fig. 2 and
Conclusion
In this article we introduced quasi-random ranked set sampling (QRSS) for natural resources, where the random-start Halton sequence was used to draw a ranked set sample. The Halton sequence ensured the sample locations of a measured ranking variable were evenly spread over the resource. Numerical results showed that QRSS with was more precise than ranked set sampling with . This makes QRSS particularly useful in practice because a design requires far fewer ranking measurements and
CRediT authorship contribution statement
B.L. Robertson: Conceptualization, Methodology, Software, Writing - original draft. M. Reale: Conceptualization, Writing - review & editing. C.J. Price: Methodology, Writing - review & editing. J.A. Brown: Conceptualization, Writing - review & editing.
Acknowledgments
We thank two anonymous referees and the editor for valuable comments that led to an improved article.
References (21)
- et al.
Resampling methods for ranked set samples
Comput. Statist. Data Anal.
(2006) Error bounds for quasi-Monte Carlo integration with uniform point sets
J. Comput. Appl. Math.
(2003)- et al.
A modification of balanced acceptance sampling
Statist. Probab. Lett.
(2017) - et al.
Randomized Halton sequences
Math. Comput. Modelling
(2000) - et al.
Ranked set sampling theory with order statistics background
Biometrics
(1972) Discrepance de suites associees a un systeme de numeration (en dimension s)
Acta Arith.
(1982)- et al.
How to select representative samples
Scand. J. Stat.
(2014) - et al.
Trial of ranked-set sampling for forage yields
Forest Sci.
(1966) On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals
Numer. Math.
(1960)- et al.
Sampling Theory for the Ecological and Natural Resource Sciences
(2019)
Cited by (5)
One point per cluster spatially balanced sampling
2024, Computational Statistics and Data AnalysisA review on concomitants of order statistics and its application in parameter estimation under ranked set sampling
2024, Journal of the Korean Statistical SocietyTrade-off between efficiency and variance estimation of spatially balanced augmented samples
2023, Environmental and Ecological StatisticsNew modification of ranked set sampling for estimating population mean
2023, Journal of Statistical Computation and SimulationSpatially Balanced Sampling with Local Ranking
2022, Journal of Agricultural, Biological, and Environmental Statistics