Abstract
We study the classical statistical problem of the estimation of quantiles by order statistics of the random sample. For fixed sample size, we determine the single order statistic which is the optimal estimator of a quantile of given order. We propose a totally new approach to the problem, since our optimality criterion is based on the use of nonparametric sharp upper and lower bounds on the bias of the estimation. First, we determine the explicit analytic expressions for the bounds, and then, we choose the order statistic for which the upper and lower bound are simultaneously as close to 0 as possible. The paper contains rigorously proved theoretical results which can be easily implemented in practise. This is also illustrated with numerical examples.
Similar content being viewed by others
References
Bieniek, M. (2007). Variation diminishing property of densities of uniform generalized order statistics. Metrika, 65, 297–309.
Gajek, L., Rychlik, T. (1998). Projection method for moment bounds on order statistics from restricted families. II. Independent case. Journal of Multivariate Analysis, 64, 156–182.
Hyndman, R. J., Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50, 361–365.
Kaas, R., Buhrman, J. (1980). Mean, median and mode in binomial distributions. Statistica Neerlandica, 34, 13–18.
Keating, J. P., Tripathi, R. (2006). Percentiles, estimation of. Encyclopedia of statistical sciences (pp. 6054–6060). New York: John Wiley.
Komornik, V. (2006). Another short proof of Descartes’s rule of signs. American Mathematical Monthly, 113, 829–830.
Moriguti, S. (1953). A modification of Schwarz’s inequality with applications to distributions. Annals of Mathematical Statistics, 24, 107–113.
Okolewski, A., Rychlik, T. (2001). Sharp distribution-free bounds on the bias in estimating quantiles via order statistics. Statistics and Probability Letters, 52, 207–213.
Papadatos, N. (1995). Maximum variance of order statistics. Annals of the Institute of Statistical Mathematics, 47, 185–193.
Parrish, R. S. (1990). Comparison of quantile estimators in normal sampling. Biometrics, 46, 247–257.
Perrin, O., Redside, E. (2007). Generalization of simmons theorem. Statistics and Probability Letters, 77, 604–606.
Reiss, R.-D. (1989). Approximate distributions of order statistics. New York: Springer-Verlag.
Rudin, W. (1976). Principles of mathematical analysis (3rd ed.). New York: McGraw-Hill Book Co.
Schoenberg, I. J. (1959). On variation diminishing approximation methods. In R. E. Langer (Eds.) On numerical approximation. Proceedings of a symposium, Madison, April 21-23, 1958 (pp. 249–274). Madison: The University of Wisconsin Press.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: John Wiley.
Zieliński, R. (2009). Optimal nonparametric quantile estimators. Towards a general theory. A survey. Communications in Statistics Theory and Methods, 38(7), 980–992.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: The proofs of corollary 2 and lemma 7
Proof of corollary 2
For \(p\ge \theta _{j+1}\), inequality (29) is obvious, since \(F_{j:n}>F_{j+1:n}\) on (0, 1). For \(p\in (0,\theta _{j+1})\), we use the first inequality in (9) and the fact that the bounds \({\overline{B}}(j,n,p)\) and \({\overline{B}}(j+1,n,p)\) cannot be equal since they are attained for different distributions. This proves (29).
To prove (30), first, we fix \(1\le j\le n-1\) and assume that \(\theta _j<p<q<1\). Then, (12) holds but \(\overline{B}(j,n,q)\) and \({{\overline{B}}}(j,n,p)\) are attained for different distributions so they cannot be equal. For \(2\le j\le n\) and \(p\in (0,\theta _j)\), it is obvious that the first two terms in (22) decrease when p increases. Moreover,
for \(2\le j\le n-1\), which completes the proof. \(\square\)
Proof of lemma 7
Recall that \(\xi _1=0\). Consider the function
where the function \({\bar{h}}_j\) is defined by (37). Since \({\bar{h}}_j(\theta _j)=f_{j:n}(\xi _j)\), we get \(k_j(\xi _j)=0\). Moreover
Substituting (39) for \({\bar{h}}_j'\), we obtain after elementary computations
for all \(0\le t\le 1\), and the equality holds only for \(t=\xi _j\). So \(k_j\) is strictly increasing on [0, 1]. In particular for \(p>\xi _j\), we have \(k_j(p)>k_j(\xi _j)=0\) or
Combining this with (22), we get
Since \({\underline{B}}(j,n,p)<0\), this completes the proof. \(\square\)
Appendix 2: The proofs of the results of subsection 4.1
In the proofs, we use the following two properties of Bernstein polynomials.
Lemma 11
(VDP) The number of zeros of any linear combination \(\sum _{i=0}^n \alpha _i B_{i,n}\) of Bernstein polynomials in (0, 1) does not exceed the number of sign changes in the sequence \(\alpha _0,\alpha _1,\dotsc ,\alpha _n\) of its coefficients after deletion of zeros. Moreover, the first and the last signs of the combination are the same as the signs of the first and the last, respectively, nonzero element of the sequence.
VDP of Bernstein polynomials was proved by Schoenberg (1959) and Gajek and Rychlik (1998). In fact, this is a simple consequence of well-known Descartes rule of signs (see e.g. Komornik 2006). See Bieniek (2007) for far reaching generalization of VDP to some special cases of Meijer’s \({\mathsf {G}}\)-functions.
Lemma 12
(Simmons’ inequality) For \(1\le k<\frac{n}{2}\), we have
and the equality holds if and only if \(k=\frac{n}{2}\).
The reader is referred to Perrin and Redside (2007) for the latest proof of Simmons’ inequality and to references therein for its older proofs.
Proof of lemma 8
The inequality \(p_j< q_j < p_{j+1}\) for \(1\le j\le n-1\) follows from the proof of the uniqueness of \(q_j\). To prove that \(p_j< \frac{j}{n} < p_{j+1}\), for \(1\le j\le n-1\) we use the inequalities (16). Therefore, by the definition of \(p_j\) and \(p_{j+1}\), we obtain
Since \(F_{j:n}\) and \(F_{j+1:n}\) are strictly increasing, we get desired inequality.
Now assume that \(1\le j<\frac{n}{2}\). Since both \(\frac{j}{n}\) and \(q_{j}\) are in the interval \((p_j, p_{j+1})\), then \(\frac{j}{n}<q_{j}\) if and only if \(Q_{j,n}\left( \frac{j}{n}\right) <0\) or equivalently
But this is equivalent to Simmons’ inequality, which proves (35). To prove (36), it is enough to combine (32) and (34) with (35). If \(j=\frac{n}{2}\), then Simmons’ inequality becomes the equality equivalent to
Therefore, \(Q_{n/2,n}\left( \tfrac{1}{2} \right) =0\), so \(q_{n/2}=\frac{1}{2}\). \(\square\)
For the next two proofs, we need another auxiliary functions. For \(2\le j\le n-1\), we define the \(h_j\) and \({\bar{h}}_j\) as
with \(h_j(1)=h_j(1^-)=0\), and
with \({\bar{h}}(0)={\bar{h}}(0^+)=0\). Then \({\bar{h}}_j(x)=h_{n-j+1}(1-x)\) and
Expanding \(F_{j:n}\) as the sum of Bernstein polynomials we have
Applying VDP to the expression inside the brackets, we see that it is first positive, then negative (\(+\,-\), for short), so equation (17) defining \(\theta _j\) has exactly one solution. By VDP, we see that \(h_j'\) is also \(+\,-\). Therefore, \(h_j\) is strictly increasing on \((0,\theta _j)\) from 1 at \(x=0\) to \(h(\theta _j)>1\) and then strictly decreasing on \((\theta _j,1)\) to 0 at \(x=1\). Analogously,
and we infer that \({\bar{h}}_j\) is strictly increasing on \((0,\xi _j)\) from 0 at \(x=0\), and strictly decreasing on \((\xi _j,1)\) from \(\bar{h}_j (\xi _j) >1\) to 1 at \(x=1\). By these monotonicity properties of \(h_j\) and \({\bar{h}}_j\), we infer that for \(2\le j\le n-1\)
and
Proof of lemma 9(a)
By (40), we have to check that \(h'_{j+1}\left( \tfrac{j}{n} \right) <0\). By (38), this is equivalent to
However, as explained in Sect. 3, the mode of \(Y\sim {\mathcal {B}} \left( n,\frac{j}{n}\right)\) is equal to j, so \(B_{i,n}\left( \tfrac{j}{n}\right) <B_{j,n}\left( \tfrac{j}{n}\right)\) for \(0 \le i<j\). Now if \(1\le j< \frac{n}{2}\) then \(j+1\le n-j\). Combining this with the last inequality, we easily obtain (42), which completes the proof. \(\square\)
Proof of lemma 9(b)
By (41), we need to show that \({\bar{h}}'_j(q_j)>0\). By the definition of \(q_j\), we have
Using (39), we obtain for \(2\le j\le n-1\)
For \(2\le j\le n-1\), we define the function
Then, \({\bar{h}}'_j(q_j)=\frac{1}{q_j^2} g_{j,n}(q_j)\). By VDP, \(g_{j,n}\) is first negative and then positive on (0, 1). Therefore, since by Lemma 8 we have \(\frac{j}{n}<q_j\) for for \(1\le j<\frac{n}{2}\), it suffices to prove that
for j satisfying the assumptions in part (b).
If \(n=5\) and \(j=2\), then \(g_{2,5}(\frac{2}{5})=\frac{3^3}{5^5}>0\). Since \(q_2(5)>\frac{2}{5}\) we also have \(g_{2,5}(q_2)>0\) and \(\bar{h}'_2(q_2)>0\). For \(3\le j<\frac{n}{2}\), the inequality (44) follows from two properties:
-
(i)
\(g_{3,n}\left( \tfrac{3}{n} \right) >0\) for \(n\ge 4\);
-
(ii)
\(g_{j,n}\left( \tfrac{j}{n} \right) <g_{j+1,n}\left( \tfrac{j+1}{n} \right)\) for \(2\le j\le n-2\).
To prove property (i), it suffices to study the expansion
where \(c(n)=23n^3-60n^2-9n+54\). It is elementary to show that \(c(6)>0\) and c(n) is increasing for \(n\ge 6\), which implies (i).
For the proof of (ii), first, we need to study monotonicity properties of \(g_{j,n}\). By VDP, \(g_{j,n}\) is first negative and then positive on (0, 1). Moreover, we have \(g_{j,n}(0)=-1\), \(g_{j,n}(1)=0\) and using an elementary relation
(with the convention that \(B_{-1,n}\equiv 0\)) we obtain
Therefore, by VDP, the derivative \(g'_{j,n}\) is \(+\,-\). Furthermore, for \(p\in (0,1)\), simple computations show that
So \(g_{j,n}\) is increasing on the interval \(\left( 0,b_{j,n} \right)\) from \(-1\) to \(g_{j,n}(b_{j,n})>0\), and then decreasing to 0 at 1. Another series of elementary computations shows that for \(p\in (0,1)\) and \(2\le j\le n-2\)
Then, obviously we have \(\tfrac{j}{n}< \tfrac{j+1}{n+1} < b_{j,n}\). Therefore, \(g_{j,n}\) is increasing on \(\left( \frac{j}{n}, \frac{j+1}{n+1} \right)\) and \(g_{j+1,n}\) is increasing on \(\left( \frac{j+1}{n+1}, \frac{j+1}{n} \right)\). In consequence,
This completes the proof of (b). \(\square\)
Proof of lemma 9(c)
For \(n=6\), we have \(Q_{2,6}\left( \frac{41}{120} \right) >0\), so \(q_2(6)<\frac{41}{120}\). Moreover, \(g_{2,6}\left( \frac{41}{120} \right) <0\), so as in the proof of (a) we get \(\xi _2(6)<q_2(6)\). Moreover, \({\bar{h}}'_2\left( \frac{2}{6} \right) >0\), so \(\xi _2(6)>\frac{2}{6}\). Finally, \(\bar{h}'_3(q_2)=\frac{1}{q_2^2}g(p)\) where
By VDP, the function g is \(-\,+\). Moreover, \(q_2>\frac{2}{6}\) and \(g\left( \frac{2}{6} \right) >0\), and thus \(g(q_2)>0\). Therefore, \({\bar{h}}'_3(q_2)>0\) and \(\xi _3>q_2\). \(\square\)
Proof of lemma 9(d)
To compare \(\frac{2}{n}\) with \(\xi _2(n)\), we write \({\bar{h}}'_2\) in the form
Then,
Moreover, \({\bar{h}}'_2\left( \frac{2}{7} \right) <0\) and the expression in brackets is strictly decreasing with respect to \(n\ge 7\). By (41), we obtain \(\xi _2(n)<\frac{2}{n}\) for \(n\ge 7\), which completes the proof of (9). \(\square\)
Proof of lemma 9(e)
By (41), we need to show that \({\bar{h}}'_2(q_1)>0\). Using (43), we compute that
It is elementary to show that \(2B_{2,n}(p)>B_{0,n}(p)\) for \(p\in \left( \frac{1}{n},1 \right)\). Since \(q_1>\frac{1}{n}\), the claim follows. \(\square\)
Appendix 3: The proof of theorem 4
Proof of theorem 4(a)
Using Lemma 8, for \(1\le j\le \frac{n}{2}\) and \(p\in \left[ \frac{j}{n},q_j\right]\) we have
Next, using Corollary 3(a), the inequality (45) and Corollary 4(a) for \(n\ge 3\) and \(p\in \left[ \frac{1}{n},q_1\right]\) we have
Thus, \(1\le k\left( n,\frac{1}{n}\right) \le k\left( n,q_1\right) \le 2\) by Theorem 1.
First, we prove that \(k(n,q_1)=2\) or equivalently \(s_{n,q_1}(1)>s_{n,q_1}(2)\). For all \(p\in \left[ \frac{1}{n},q_1\right]\) by (46), we have \(s_{ {n,p}}(2)=1-2F_{2: {n}}(p)\). Moreover, using (46) again and the definition of \(q_1\), we obtain \(s_{n,q_1}(1)>2-F_{1: {n}}(q_1)-1=s_{n,q_1}(2)\).
To prove that \(k\left( n,\frac{1}{n}\right) =1\), we need to show that for \(n\ge 3\)
Elementary but tedious computations, using (46) with \(p=\frac{1}{n}\), show that
is strictly decreasing to its limit \(\frac{4}{\mathrm e}-1>\frac{4}{9}\) and \(a_n<\frac{4}{9}\) for \(n\ge 3\). Therefore, any element of the sequence \(\left\{ b_ {n} \right\}\) is greater than any element of \(\left\{ a_ {n} \right\}\). In particular, the inequality (47) holds for all \(n\ge 3\), which completes the proof of (a). \(\square\)
Proof of theorem 4(b)
Assume that \(n\ge 6\). By Corollary 4(b) and (c) we have \(\left[ \frac{2}{n},q_2 \right] \subset \left( \theta _3,\xi _3 \right)\), so for p in the former interval by (45) we have
On the other hand, since \(r_{ {n,p}}\) is strictly decreasing with respect to p, then
Here, the second inequality follows from \(\frac{2}{n}>\theta _2\) (in fact it becomes equality for \(n=6\)), and the third one follows from (45). Summarizing, for \(p=\frac{2}{n}\) and \(p=q_2\) we have \(r_{ {n,p}}(2)<0<r_{ {n,p}}(3)\) and therefore
First, we prove that \(k\left( n,q_2 \right) =3\) or in other words \(s_{n,q_2}(2)>s_{n,q_2}(3)\). Indeed, by (48) we have \(s_{n,q_2}(3)=1-2F_{3:n}(q_2)\). Moreover, by Corollary 4(b) and (c) we have \(q_2>\xi _2>\theta _2\), so \(r_{n,q_2}(2)<t_{n,q_2}(2)<0\). Therefore \(s_{n,q_2}(2)>2F_{2:n}(q_2)-1=s_{n,q_2}(3)\) by the definition of \(q_2\).
Now we prove that \(k\left( n,\frac{2}{n} \right) =2\) or equivalently \(s_{n,\frac{2}{n}}(2)<s_{n,\frac{2}{n}}(3)\). Since \(\frac{2}{n}\in \left( \theta _3,\xi _3\right)\) then by (48)
For \(n=6\), we have also \(\frac{2}{6}\in \left( \theta _3,\xi _2 \right)\), so by (49) we obtain \(s_{6,\frac{2}{6}}(2)=2F_{2:n}\left( \frac{2}{6} \right) -1<s_{6,\frac{2}{6}}(3)\) by (50) and Simmons’ inequality. For \(n\ge 7\), we define the sequences
Tedious computations show that \(d_{n}\) is strictly decreasing to its limit \(\frac{10}{\mathrm e^2}-1\) and \(c_\text {n}<\frac{10}{\mathrm e^2}-1\) for \(n\ge 7\). So \(c_ {n}<d_ {n}\) for all \(n\ge 7\), which completes the proof. \(\square\)
About this article
Cite this article
Bieniek, M., Pańczyk, L. On the choice of the optimal single order statistic in quantile estimation. Ann Inst Stat Math 75, 303–333 (2023). https://doi.org/10.1007/s10463-022-00845-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00845-3