Abstract
Scoring functions are commonly used to evaluate a point forecast of a particular statistical functional. This scoring function should be consistent, meaning the correct value of the functional is the Bayes act, in which case we say the scoring function elicits the functional. Recent results show that the mode functional is not elicitable. In this work, we ask whether it is at least possible to indirectly elicit the mode, wherein one elicits a low-dimensional functional from which the mode can be computed. We show that this cannot be done: Neither the mode nor a modal interval is indirectly elicitable with respect to the class of identifiable functionals.
Similar content being viewed by others
References
Chernoff, H. (1964). Estimation of the mode. Annals of the Institute of Statistical Mathematics, 16(1), 31–41.
Fissler, T., Ziegel, J. F. (2017). Order-sensitivity and equivariance of scoring functions. Electronic Journal of Statistics, 13(1), 1166–1211.
Fissler, T., Ziegel, J. F., et al. (2016). Higher order elicitability and Osband’s principle. The Annals of Statistics, 44(4), 1680–1707.
Frongillo, R., Kash, I. (2015). On elicitation complexity. In: Cortes, C, Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in neural information processing systems 28(pp. 3258–3266) Curran Associates, Inc.
Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association, 106(494), 746–762.
Gneiting, T., Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378.
Grenander, U. (1965). Some direct estimates of the mode. The Annals of Mathematical Statistics, 36(1), 131–138.
Heinrich, C. (2014). The mode functional is not elicitable. Biometrika, 101(1), 245–251.
Lambert, N. S. (2018). Elicitation and evaluation of statistical forecasts. Preprint.
Lambert, N. S., Pennock, D. M., Shoham, Y. (2008). Eliciting properties of probability distributions. In Proceedings of the 9th ACM conference on electronic commerce, ACM, pp. 129–138.
Lee, M. J. (1989). Mode regression. Journal of Econometrics, 42(3), 337–349.
Osband, K. (1985). Providing incentives for better cost forecasting. PhD thesis, University of California, Berkeley.
Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3), 1065–1076.
Robertson, T., Cryer, J. D. (1974). An iterative procedure for estimating the mode. Journal of the American Statistical Association, 69(348), 1012–1016.
Steinwart, I., Pasin, C., Williamson, R., Zhang, S. (2014). Elicitation and identification of properties. In: Balcan, M. F., Feldman, V., Szepesvri, C. (Eds.), Proceedings of the 27th conference on learning theory, PMLR, Barcelona, Spain, proceedings of machine learning research(pp. 482–526) vol 35.
Venter, J. (1967). On estimation of the mode. The Annals of Mathematical Statistics, 38(5), 1446–1455.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by National Science Foundation Grant CCF-1657598.
Appendices
Omitted proofs
Proof of Theorem 1
Let \(\varepsilon >0\) be given. Since the identification complexity lower bounds the identifiable elicitation complexity of the mode, it suffices to show that the mode is not k-identifiable for arbitrary \(k\in \mathbb {N}\). Suppose, by way of contradiction, that the mode is k-identifiable. Hence, there exists a property \(\hat{\varGamma }:\mathcal {P}\rightarrow \hat{\mathcal {R}}\subseteq \mathbb {R}^k\) identified by \(V:\hat{\mathcal {R}}\times \mathbb {R}\rightarrow \mathbb {R}^k\) and function \(f:\hat{\mathcal {R}}\rightarrow \mathcal {R}\) such that \(\varGamma _{\text {mode}}=f\circ \hat{\varGamma }\). Our goal will be to specify two densities \(p,p'\in \mathcal {P}_{\psi ,\varepsilon }\subseteq \mathcal {P}\) with \(\hat{\varGamma }(p)=\hat{\varGamma }(p')\) and \(\varGamma _{\text {mode}}(p)\ne \varGamma _{\text {mode}}(p')\), contradicting the existence of f.
Let \(t>k\) and consider the following density \(p=\sum _{i=0}^t h_i\psi _{4i\varepsilon ,\varepsilon }\) in \(\mathcal {P}_{\psi ,\varepsilon }\) with strictly decreasing heights \(h_0>h_1>\dots h_t>h_0/2>0\) and \(\sum _{i=0}^t h_i=1\). Observe that \(\varGamma _{\text {mode}}(p)=0\) and denote \(\hat{\varGamma }(p)=r\). Consider the \(k\times t\) matrix
Let \({h}'=(h_1',\dots , h_t')\) denote a nontrivial vector in the kernel of M. To complete the proof, we will demonstrate that for any \({h}'\) there exist real numbers \(\alpha ,\beta \in \mathbb {R}\) so that \(p'=\beta \left( p+\alpha \left( \sum _{i=1}^t h_i' \psi _{4i\varepsilon ,\varepsilon }\right) \right) \) is a density satisfying \(\hat{\varGamma }(p')=r\) and \(\varGamma (p')\ne 0\). We proceed by considering all cases of \({h}'\) and showing the existence of \(\alpha \) in each case.
First, considering \(h_1',\dots ,h_t'\ge 0\), let \(h_{i(\text {max})}'\) denote the entry of \({h}'\) with greatest magnitude (if not unique, choose the entry associated with the maximal initial height \(h_{i(\text {max})}\)), and take \(\alpha >(h_0-h_{i(\text {max})})/h_{i(\text {max})}'\). Second, if \(h_1', \dots , h_t'\le 0\), then take \(-{h}'\) and treat as above. In the final case, at least one pair of entries of \({h}'\) have opposite sign. Let \(h_{i(\text {max})}'\) denote an entry of \({h}'\) with the greatest magnitude (if not unique, choose the entry associated with the maximal initial height \(h_{i(\text {max})}\)) and assume \(h_{i(\text {max})}'>0\); otherwise, take \(-{h}'\). Choose \(\alpha \) such that \( (h_0-h_{i(\text {max})})/h_{i(\text {max})}'<\alpha \le \min _{\{i: h_i'<0\}} h_i/|h_i'|\) satisfying \(\alpha \ne |h_{i(\text {max})}-h_i|/(h_{i(\text {max})}'-h_i')\) for any i with \(h_{i(\text {max})}'>h_i'>0\). Note this interval is nonempty because \(h_0-h_{i(\text {max})}<\frac{h_0}{2}<h_i\) and \(|h_i'|<h_{i(\text {max})}'\) for all i such that \(h_i'<0\). In each of the above cases, there are finitely many \(\alpha \) which do not yield a unimodal \(p'\). If the \(\alpha \) chosen yields a \(p'\) which is not unimodal, then discard this particular \(\alpha \) from the interval and choose again.
With the appropriate normalization constant \(\beta \), we now have a density given by \(p'=\beta \left( p+\alpha \left( \sum _{i=1}^t h_i' \psi _{4i\varepsilon ,\varepsilon }\right) \right) \). As \(h'\) is contained in the kernel of M, linearity of expectation and the definition of V now guarantee that \(\hat{\varGamma }(p')=\hat{\varGamma }(p)=r\), and the method with which we showed \(\alpha \) exists ensures that \(p'\) is unimodal with \(\varGamma _{\text {mode}}(p')\ne \varGamma _{\text {mode}}(p)=0\). These two statements together contradict the existence of the link function f satisfying \(\varGamma _{\text {mode}}=f\circ \hat{\varGamma }\). \(\square \)
Proof of Theorem 2
As in the proof of Theorem 1, we assume the mode is k-identifiable and arrive at a contradiction. Hence, we assume there exists a property \(\hat{\varGamma }:\mathcal {Q}\rightarrow \hat{\mathcal {R}}\subseteq \mathbb {R}^k\) identified by \(V:\hat{\mathcal {R}}\times \mathbb {R}\rightarrow \mathbb {R}^k\) and function \(f:\hat{\mathcal {R}}\rightarrow \mathcal {R}\) such that \(\varGamma _{\text {mode}}=f\circ \hat{\varGamma }\). We will again specify two densities from \(\mathcal {Q}\) in the same level set of \(\hat{\varGamma }\), but different modes which contradict the existence of f.
Let \(t>k\), and let \(q_0,q_1,\dots , q_t\) be Gaussian densities with unit height (\(\sigma ^2=\frac{1}{2\pi }\)) centered at \(x_i=Ci\) for some C to be determined. For any mixture parameters \(h=(h_0,h_1,\dots , h_t)\in \mathbb {R}^{t+1}_+\), we will denote the Gaussian mixture density as follows,
where we define \(\mathcal {Q}'\) to be all positive scalings of densities in \(\mathcal {Q}\). As we are interested in the mode, we can always renormalize to obtain a distribution in \(\mathcal {Q}\) with the same mode. In the following, we extend \(\varGamma _{\text {mode}}(p)\) for unnormalized densities in the natural way.
Observe that for any mixture h, we have \(\varGamma _{\text {mode}}(q[h]) \in \cup _{i=0}^t B_{\sigma }(x_i)\), for any \(C > 0\). This follows from second-order optimality conditions: As the inflection point of a Gaussian density \(N(\mu ,\sigma )\) is at \(\mu \pm \sigma \), we have \(\frac{d^2}{dx^2} q_i(x)< 0 \iff |x-x_i| < \sigma \), and thus, \(\frac{d^2}{dx^2} q[h](x)< 0 \implies |x-x_i| < \sigma \) for some i. Let \(\gamma := q_1(\sigma ) = e^{-\pi (\sigma -C)^2}\). We will want \(\gamma <\frac{1}{4(t+1)}\), and thus, we choose any \(C>\sigma +\sqrt{\frac{\log (4(t+1)}{\pi }}\).
We will additionally use the following claims in our proof.
Claim 1
For all h, i, \(h_i \le q[h](x_i) \le \max _{x\in B_\sigma (x_i)} q[h](x) \le h_i + \gamma \sum _{j\ne i} h_j\).
Claim 2
If \(h_i > \max _{j\ne i} h_j + \gamma \sum _{k} h_k\), then \(\varGamma _{\text {mode}}(q[h])\in B_\sigma (x_i)\).
Claim 3
If \(h_i < h_j - \gamma \sum _{k\ne i} h_k\), then \(\varGamma _{\text {mode}}(q[h])\notin B_\sigma (x_i)\).
In Claim 1, the first two inequalities are trivial, and the third follows from the observation that the contribution of \(q_j\) to q[h](x) is upper bounded by \(h_j\gamma \) for all \(x\in B_\sigma (x_i)\). Claim 2 then follows from Claim 1: For all j, we have \(q[h](x_i) \ge h_i > h_j + \gamma \sum _k h_k \ge h_j + \gamma \sum _{k\ne j} h_k \ge \max _{x\in B_\sigma (x_j)} q[h](x)\). Similarly, for Claim 3, \(\max _{x\in B_\sigma (x_i)} q[h](x) \le h_i + \gamma \sum _{k\ne i} h_k < h_j \le q[h](x_j)\).
Finally, we construct our initial mixture h so that \(\sum _ih_i = 1\) and the following condition holds,
By Claim 2, we therefore would have \(\varGamma _{\text {mode}}(q[h]) \in B_\sigma (x_0)\). Condition (4) can be satisfied for \(t>5\) (and smaller if C is larger); we give one explicit construction here. Letting \(c=1/(t+1)\) for ease of notation, we may take \(h_0 = (5/4)c\) and \(h_1=c\). Enforcing \(\sum _i h_i = 1\), the average of the remaining elements is then \(c -(1/4)c/(t-1) = (1-1/4(t-1))c\) which is strictly less than \(h_1\) but strictly greater than \(c(1-1/16) = (3/4)h_0\), as desired. We may therefore choose the remaining elements to be any decreasing sequence in the interval \((3h_0/4,h_1)\) whose average is \(c(1-1/4(t-1)) \in (3h_0/4,h_1)\).
Now, let \(\hat{\varGamma }(q[h])=r\). Consider the \(k\times t\) matrix
Let \({h}'=(h_1',\dots , h_t')\) denote a nontrivial vector in the kernel of M. To complete the proof, we will demonstrate that for any such \(h'\) there exists a real number \(\alpha \in \mathbb {R}\) so that \(q[h+\alpha h']=q[h]+\alpha \sum _{i=1}^t h_i'q_i\) (after normalization to obtain the corresponding element in \(\mathcal {Q}\)) is the desired density. We proceed by cases on the entries of \(h'\).
First, if \(h_1',\dots ,h_t'\ge 0\), then let \(h_{i(\text {max})}'\) denote the entry of \({h}'\) with greatest magnitude. If \(h_{i(\text {max})}'\) is not unique, then choose the entry associated with the maximal initial height \(h_{i(\text {max})}\). Choose \(\alpha \) such that
This ensures that \(h_0<(h_{i(\text {max})}+\alpha h_{i(\text {max})}')-\gamma \left( 1+\alpha \sum _{k\ne i(\text {max})}h_k'\right) \) so that \(\varGamma _{\text {mode}}(q[h+\alpha h'])\not \in B_{\sigma }(x_0)\) by Claim 3. Second, if \(h_1', \dots , h_t'\le 0\), then take \(-{h}'\) and treat as above.
In the final case, at least one pair of entries of \({h}'\) have opposite sign. Let \(h_{i(\text {max})}'\) denote the entry of \({h}'\) with the greatest magnitude and assume \(h_{i(\text {max})}'>0\); otherwise, take \(-{h}'\). If \(h_{i(\text {max})}'\) is not unique, then choose the entry associated with the maximal initial height \(h_{i(\text {max})}\). Choose \(\alpha \) such that
Once again, the lower bound ensures that \(h_0<(h_{i(\text {max})}+\alpha h_{i(\text {max})}')-\gamma \Big (1+\)\(\alpha \sum _{k\ne i(\text {max})}h_k'\Big )\) so that \(\varGamma _{\text {mode}}(q[h+\alpha h'])\not \in B_{\sigma }(x_0)\) by Claim 3. We bound \(\alpha \) from above in this case to ensure that \(q[h+\alpha h']\ge 0\), meaning we have a valid density.
It thus remains to verify that this interval is nonempty. Take an index i such that \(h_i'<0\). Note that \(h_{i(\text {max})}'\ge \frac{\sum _{k\ne i(\text {max})}h_k'}{t}>\gamma \sum _{k\ne i(\text {max})}h_k'\), so that \(h_{i(\text {max})}'-\gamma \left( \sum _{k\ne i(\text {max})}h_k'\right)>h_{i(\text {max})}'(1-\gamma t)>\frac{3h_{i(\text {max})}'}{4}\ge \frac{3|h_i'|}{4}\). Also note that \(\frac{h_0}{4}+\gamma<\frac{h_0}{4}+\frac{1}{4(t+1)}<\frac{h_0}{2}<\frac{3h_0}{4}\cdot \frac{3}{4}\). Chaining these inequalities together,
As this inequality holds for all such i, it holds for the minimum over i.
In each of the above cases, there are finitely many \(\alpha \) which fail to yield a unimodal density, \(q[h+\alpha h']\). If the \(\alpha \) chosen yields such a \(q[h+\alpha h']\), discard this particular \(\alpha \) and choose again.
Similar to the conclusion of Theorem 1, the density \(q[h+\alpha h']\) (after normalization to obtain the corresponding element in \(\mathcal {Q}\)) gives the desired contradiction. \(\square \)
Experimental details
So as to allow comparison with Heinrich (2014), we consider a density \(p_{\text {mix}}\) which is a mixture of two Gaussians; letting \(p_1 = N(2,1.5)\) and \(p_2 = N(-2,0.5)\), where \(N(\mu ,\sigma )\) denotes a Gaussian density with mean \(\mu \) and standard deviation \(\sigma \), we set \(p_{\text {mix}}= 0.75 p_1 + 0.25 p_2\). The true mode of \(p_{\text {mix}}\) is \(m_0 = \varGamma _{\text {mode}}(p_{\text {mix}}) \approx -1.987047\), with the other local maximum occurring at \(m_1 \approx 2.000000\). The experiment performed is analogous to Heinrich (2014): For each value of \(\varepsilon \) as shown in Table 1, and in each of 1000 trials, we collect \(n=10,000\) independent samples from \(p_{\text {mix}}\), and measure the performance of the empirical modal midpoint \({\hat{x}}_\varepsilon \) relative to the true mode \(m_0\) and true modal midpoint \(x_\varepsilon = \varGamma _\varepsilon (p_{\text {mix}})\). In the case of a tie for \({\hat{x}}_\varepsilon \), we take the lowest value (which the reader will note should favor the correct value). In sum, our results are qualitatively similar to Heinrich (2014), in that the modal midpoint \({\hat{x}}_\varepsilon \) fails to estimate the mode, but we can also confirm that it fails to estimate the modal midpoint \(x_\varepsilon \) as well. Note in particular that the two “Versus local max” columns are identical.
About this article
Cite this article
Dearborn, K., Frongillo, R. On the indirect elicitability of the mode and modal interval. Ann Inst Stat Math 72, 1095–1108 (2020). https://doi.org/10.1007/s10463-019-00719-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-019-00719-1