Skip to main content
Log in

On the indirect elicitability of the mode and modal interval

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Scoring functions are commonly used to evaluate a point forecast of a particular statistical functional. This scoring function should be consistent, meaning the correct value of the functional is the Bayes act, in which case we say the scoring function elicits the functional. Recent results show that the mode functional is not elicitable. In this work, we ask whether it is at least possible to indirectly elicit the mode, wherein one elicits a low-dimensional functional from which the mode can be computed. We show that this cannot be done: Neither the mode nor a modal interval is indirectly elicitable with respect to the class of identifiable functionals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Chernoff, H. (1964). Estimation of the mode. Annals of the Institute of Statistical Mathematics, 16(1), 31–41.

    Article  MathSciNet  Google Scholar 

  • Fissler, T., Ziegel, J. F. (2017). Order-sensitivity and equivariance of scoring functions. Electronic Journal of Statistics, 13(1), 1166–1211.

    Article  MathSciNet  Google Scholar 

  • Fissler, T., Ziegel, J. F., et al. (2016). Higher order elicitability and Osband’s principle. The Annals of Statistics, 44(4), 1680–1707.

    Article  MathSciNet  Google Scholar 

  • Frongillo, R., Kash, I. (2015). On elicitation complexity. In: Cortes, C, Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in neural information processing systems 28(pp. 3258–3266) Curran Associates, Inc.

  • Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association, 106(494), 746–762.

    Article  MathSciNet  Google Scholar 

  • Gneiting, T., Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378.

    Article  MathSciNet  Google Scholar 

  • Grenander, U. (1965). Some direct estimates of the mode. The Annals of Mathematical Statistics, 36(1), 131–138.

    Article  MathSciNet  Google Scholar 

  • Heinrich, C. (2014). The mode functional is not elicitable. Biometrika, 101(1), 245–251.

    Article  MathSciNet  Google Scholar 

  • Lambert, N. S. (2018). Elicitation and evaluation of statistical forecasts. Preprint.

  • Lambert, N. S., Pennock, D. M., Shoham, Y. (2008). Eliciting properties of probability distributions. In Proceedings of the 9th ACM conference on electronic commerce, ACM, pp. 129–138.

  • Lee, M. J. (1989). Mode regression. Journal of Econometrics, 42(3), 337–349.

    Article  MathSciNet  Google Scholar 

  • Osband, K. (1985). Providing incentives for better cost forecasting. PhD thesis, University of California, Berkeley.

  • Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3), 1065–1076.

    Article  MathSciNet  Google Scholar 

  • Robertson, T., Cryer, J. D. (1974). An iterative procedure for estimating the mode. Journal of the American Statistical Association, 69(348), 1012–1016.

    Article  MathSciNet  Google Scholar 

  • Steinwart, I., Pasin, C., Williamson, R., Zhang, S. (2014). Elicitation and identification of properties. In: Balcan, M. F., Feldman, V., Szepesvri, C. (Eds.), Proceedings of the 27th conference on learning theory, PMLR, Barcelona, Spain, proceedings of machine learning research(pp. 482–526) vol 35.

  • Venter, J. (1967). On estimation of the mode. The Annals of Mathematical Statistics, 38(5), 1446–1455.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank Tobias Fissler and Jessie Finocchiaro for helpful suggestions, Jonas Brehmer for simplifying the proof of Lemma 1, and Nicole Woytarowicz for her initial work on this project, including a proof of Lemma 1 in her B.S. thesis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Frongillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by National Science Foundation Grant CCF-1657598.

Appendices

Omitted proofs

Proof of Theorem 1

Let \(\varepsilon >0\) be given. Since the identification complexity lower bounds the identifiable elicitation complexity of the mode, it suffices to show that the mode is not k-identifiable for arbitrary \(k\in \mathbb {N}\). Suppose, by way of contradiction, that the mode is k-identifiable. Hence, there exists a property \(\hat{\varGamma }:\mathcal {P}\rightarrow \hat{\mathcal {R}}\subseteq \mathbb {R}^k\) identified by \(V:\hat{\mathcal {R}}\times \mathbb {R}\rightarrow \mathbb {R}^k\) and function \(f:\hat{\mathcal {R}}\rightarrow \mathcal {R}\) such that \(\varGamma _{\text {mode}}=f\circ \hat{\varGamma }\). Our goal will be to specify two densities \(p,p'\in \mathcal {P}_{\psi ,\varepsilon }\subseteq \mathcal {P}\) with \(\hat{\varGamma }(p)=\hat{\varGamma }(p')\) and \(\varGamma _{\text {mode}}(p)\ne \varGamma _{\text {mode}}(p')\), contradicting the existence of f.

Let \(t>k\) and consider the following density \(p=\sum _{i=0}^t h_i\psi _{4i\varepsilon ,\varepsilon }\) in \(\mathcal {P}_{\psi ,\varepsilon }\) with strictly decreasing heights \(h_0>h_1>\dots h_t>h_0/2>0\) and \(\sum _{i=0}^t h_i=1\). Observe that \(\varGamma _{\text {mode}}(p)=0\) and denote \(\hat{\varGamma }(p)=r\). Consider the \(k\times t\) matrix

$$\begin{aligned} M= \begin{bmatrix} E_{\psi _{4\varepsilon ,\varepsilon }}(V(r, Y)), \dots , E_{\psi _{4t\varepsilon ,\varepsilon }}(V(r, Y))\end{bmatrix}. \end{aligned}$$
(3)

Let \({h}'=(h_1',\dots , h_t')\) denote a nontrivial vector in the kernel of M. To complete the proof, we will demonstrate that for any \({h}'\) there exist real numbers \(\alpha ,\beta \in \mathbb {R}\) so that \(p'=\beta \left( p+\alpha \left( \sum _{i=1}^t h_i' \psi _{4i\varepsilon ,\varepsilon }\right) \right) \) is a density satisfying \(\hat{\varGamma }(p')=r\) and \(\varGamma (p')\ne 0\). We proceed by considering all cases of \({h}'\) and showing the existence of \(\alpha \) in each case.

First, considering \(h_1',\dots ,h_t'\ge 0\), let \(h_{i(\text {max})}'\) denote the entry of \({h}'\) with greatest magnitude (if not unique, choose the entry associated with the maximal initial height \(h_{i(\text {max})}\)), and take \(\alpha >(h_0-h_{i(\text {max})})/h_{i(\text {max})}'\). Second, if \(h_1', \dots , h_t'\le 0\), then take \(-{h}'\) and treat as above. In the final case, at least one pair of entries of \({h}'\) have opposite sign. Let \(h_{i(\text {max})}'\) denote an entry of \({h}'\) with the greatest magnitude (if not unique, choose the entry associated with the maximal initial height \(h_{i(\text {max})}\)) and assume \(h_{i(\text {max})}'>0\); otherwise, take \(-{h}'\). Choose \(\alpha \) such that \( (h_0-h_{i(\text {max})})/h_{i(\text {max})}'<\alpha \le \min _{\{i: h_i'<0\}} h_i/|h_i'|\) satisfying \(\alpha \ne |h_{i(\text {max})}-h_i|/(h_{i(\text {max})}'-h_i')\) for any i with \(h_{i(\text {max})}'>h_i'>0\). Note this interval is nonempty because \(h_0-h_{i(\text {max})}<\frac{h_0}{2}<h_i\) and \(|h_i'|<h_{i(\text {max})}'\) for all i such that \(h_i'<0\). In each of the above cases, there are finitely many \(\alpha \) which do not yield a unimodal \(p'\). If the \(\alpha \) chosen yields a \(p'\) which is not unimodal, then discard this particular \(\alpha \) from the interval and choose again.

With the appropriate normalization constant \(\beta \), we now have a density given by \(p'=\beta \left( p+\alpha \left( \sum _{i=1}^t h_i' \psi _{4i\varepsilon ,\varepsilon }\right) \right) \). As \(h'\) is contained in the kernel of M, linearity of expectation and the definition of V now guarantee that \(\hat{\varGamma }(p')=\hat{\varGamma }(p)=r\), and the method with which we showed \(\alpha \) exists ensures that \(p'\) is unimodal with \(\varGamma _{\text {mode}}(p')\ne \varGamma _{\text {mode}}(p)=0\). These two statements together contradict the existence of the link function f satisfying \(\varGamma _{\text {mode}}=f\circ \hat{\varGamma }\). \(\square \)

Proof of Theorem 2

As in the proof of Theorem 1, we assume the mode is k-identifiable and arrive at a contradiction. Hence, we assume there exists a property \(\hat{\varGamma }:\mathcal {Q}\rightarrow \hat{\mathcal {R}}\subseteq \mathbb {R}^k\) identified by \(V:\hat{\mathcal {R}}\times \mathbb {R}\rightarrow \mathbb {R}^k\) and function \(f:\hat{\mathcal {R}}\rightarrow \mathcal {R}\) such that \(\varGamma _{\text {mode}}=f\circ \hat{\varGamma }\). We will again specify two densities from \(\mathcal {Q}\) in the same level set of \(\hat{\varGamma }\), but different modes which contradict the existence of f.

Let \(t>k\), and let \(q_0,q_1,\dots , q_t\) be Gaussian densities with unit height (\(\sigma ^2=\frac{1}{2\pi }\)) centered at \(x_i=Ci\) for some C to be determined. For any mixture parameters \(h=(h_0,h_1,\dots , h_t)\in \mathbb {R}^{t+1}_+\), we will denote the Gaussian mixture density as follows,

$$\begin{aligned} q[h](x)=\sum _{i=0}^t h_i q_i(x)\in \mathcal {Q}', \end{aligned}$$

where we define \(\mathcal {Q}'\) to be all positive scalings of densities in \(\mathcal {Q}\). As we are interested in the mode, we can always renormalize to obtain a distribution in \(\mathcal {Q}\) with the same mode. In the following, we extend \(\varGamma _{\text {mode}}(p)\) for unnormalized densities in the natural way.

Observe that for any mixture h, we have \(\varGamma _{\text {mode}}(q[h]) \in \cup _{i=0}^t B_{\sigma }(x_i)\), for any \(C > 0\). This follows from second-order optimality conditions: As the inflection point of a Gaussian density \(N(\mu ,\sigma )\) is at \(\mu \pm \sigma \), we have \(\frac{d^2}{dx^2} q_i(x)< 0 \iff |x-x_i| < \sigma \), and thus, \(\frac{d^2}{dx^2} q[h](x)< 0 \implies |x-x_i| < \sigma \) for some i. Let \(\gamma := q_1(\sigma ) = e^{-\pi (\sigma -C)^2}\). We will want \(\gamma <\frac{1}{4(t+1)}\), and thus, we choose any \(C>\sigma +\sqrt{\frac{\log (4(t+1)}{\pi }}\).

We will additionally use the following claims in our proof.

Claim 1

For all hi, \(h_i \le q[h](x_i) \le \max _{x\in B_\sigma (x_i)} q[h](x) \le h_i + \gamma \sum _{j\ne i} h_j\).

Claim 2

If \(h_i > \max _{j\ne i} h_j + \gamma \sum _{k} h_k\), then \(\varGamma _{\text {mode}}(q[h])\in B_\sigma (x_i)\).

Claim 3

If \(h_i < h_j - \gamma \sum _{k\ne i} h_k\), then \(\varGamma _{\text {mode}}(q[h])\notin B_\sigma (x_i)\).

In Claim 1, the first two inequalities are trivial, and the third follows from the observation that the contribution of \(q_j\) to q[h](x) is upper bounded by \(h_j\gamma \) for all \(x\in B_\sigma (x_i)\). Claim 2 then follows from Claim 1: For all j, we have \(q[h](x_i) \ge h_i > h_j + \gamma \sum _k h_k \ge h_j + \gamma \sum _{k\ne j} h_k \ge \max _{x\in B_\sigma (x_j)} q[h](x)\). Similarly, for Claim 3, \(\max _{x\in B_\sigma (x_i)} q[h](x) \le h_i + \gamma \sum _{k\ne i} h_k < h_j \le q[h](x_j)\).

Finally, we construct our initial mixture h so that \(\sum _ih_i = 1\) and the following condition holds,

$$\begin{aligned} h_0-\gamma>h_1>h_2>\cdots>h_t>\frac{3}{4}h_0. \end{aligned}$$
(4)

By Claim 2, we therefore would have \(\varGamma _{\text {mode}}(q[h]) \in B_\sigma (x_0)\). Condition (4) can be satisfied for \(t>5\) (and smaller if C is larger); we give one explicit construction here. Letting \(c=1/(t+1)\) for ease of notation, we may take \(h_0 = (5/4)c\) and \(h_1=c\). Enforcing \(\sum _i h_i = 1\), the average of the remaining elements is then \(c -(1/4)c/(t-1) = (1-1/4(t-1))c\) which is strictly less than \(h_1\) but strictly greater than \(c(1-1/16) = (3/4)h_0\), as desired. We may therefore choose the remaining elements to be any decreasing sequence in the interval \((3h_0/4,h_1)\) whose average is \(c(1-1/4(t-1)) \in (3h_0/4,h_1)\).

Now, let \(\hat{\varGamma }(q[h])=r\). Consider the \(k\times t\) matrix

$$\begin{aligned} M= \begin{bmatrix} E_{q_1}(V(r, Y)) , \dots , E_{q_t}(V(r, Y))\end{bmatrix}. \end{aligned}$$
(5)

Let \({h}'=(h_1',\dots , h_t')\) denote a nontrivial vector in the kernel of M. To complete the proof, we will demonstrate that for any such \(h'\) there exists a real number \(\alpha \in \mathbb {R}\) so that \(q[h+\alpha h']=q[h]+\alpha \sum _{i=1}^t h_i'q_i\) (after normalization to obtain the corresponding element in \(\mathcal {Q}\)) is the desired density. We proceed by cases on the entries of \(h'\).

First, if \(h_1',\dots ,h_t'\ge 0\), then let \(h_{i(\text {max})}'\) denote the entry of \({h}'\) with greatest magnitude. If \(h_{i(\text {max})}'\) is not unique, then choose the entry associated with the maximal initial height \(h_{i(\text {max})}\). Choose \(\alpha \) such that

$$\begin{aligned} \frac{h_0-h_{i(\text {max})}+\gamma }{h_{i(\text {max})}'-\gamma \left( \sum _{k\ne i(\text {max})}h_k'\right) }<\alpha . \end{aligned}$$

This ensures that \(h_0<(h_{i(\text {max})}+\alpha h_{i(\text {max})}')-\gamma \left( 1+\alpha \sum _{k\ne i(\text {max})}h_k'\right) \) so that \(\varGamma _{\text {mode}}(q[h+\alpha h'])\not \in B_{\sigma }(x_0)\) by Claim 3. Second, if \(h_1', \dots , h_t'\le 0\), then take \(-{h}'\) and treat as above.

In the final case, at least one pair of entries of \({h}'\) have opposite sign. Let \(h_{i(\text {max})}'\) denote the entry of \({h}'\) with the greatest magnitude and assume \(h_{i(\text {max})}'>0\); otherwise, take \(-{h}'\). If \(h_{i(\text {max})}'\) is not unique, then choose the entry associated with the maximal initial height \(h_{i(\text {max})}\). Choose \(\alpha \) such that

$$\begin{aligned} \frac{h_0-h_{i(\text {max})}+\gamma }{h_{i(\text {max})}'-\gamma \left( \sum _{k\ne i(\text {max})}h_k'\right) }<\alpha \le \min _{i:h_i'<0}\frac{h_i}{|h_i'|}. \end{aligned}$$

Once again, the lower bound ensures that \(h_0<(h_{i(\text {max})}+\alpha h_{i(\text {max})}')-\gamma \Big (1+\)\(\alpha \sum _{k\ne i(\text {max})}h_k'\Big )\) so that \(\varGamma _{\text {mode}}(q[h+\alpha h'])\not \in B_{\sigma }(x_0)\) by Claim 3. We bound \(\alpha \) from above in this case to ensure that \(q[h+\alpha h']\ge 0\), meaning we have a valid density.

It thus remains to verify that this interval is nonempty. Take an index i such that \(h_i'<0\). Note that \(h_{i(\text {max})}'\ge \frac{\sum _{k\ne i(\text {max})}h_k'}{t}>\gamma \sum _{k\ne i(\text {max})}h_k'\), so that \(h_{i(\text {max})}'-\gamma \left( \sum _{k\ne i(\text {max})}h_k'\right)>h_{i(\text {max})}'(1-\gamma t)>\frac{3h_{i(\text {max})}'}{4}\ge \frac{3|h_i'|}{4}\). Also note that \(\frac{h_0}{4}+\gamma<\frac{h_0}{4}+\frac{1}{4(t+1)}<\frac{h_0}{2}<\frac{3h_0}{4}\cdot \frac{3}{4}\). Chaining these inequalities together,

$$\begin{aligned} \frac{h_0-h_{i(\text {max})}+\gamma }{h_{i(\text {max})}'-\gamma \left( \sum _{k\ne i(\text {max})}h_k'\right) }&< \frac{h_0-h_{i(\text {max})}+\gamma }{h_{i(\text {max})}'(1-\gamma t)}\\&<\frac{\frac{h_0}{4}+\gamma }{h_{i(\text {max})}'(1-\gamma t)}\\&<\frac{\frac{3h_0}{4}\cdot \frac{3}{4}}{h_{i(\text {max})}'(1-\gamma t)}\\&\le \frac{\frac{3h_0}{4}\cdot \frac{3}{4}}{\frac{3|h_i'|}{4}} =\frac{\frac{3h_0}{4}}{|h_i'|} <\frac{h_i}{|h_i'|}~. \end{aligned}$$

As this inequality holds for all such i, it holds for the minimum over i.

In each of the above cases, there are finitely many \(\alpha \) which fail to yield a unimodal density, \(q[h+\alpha h']\). If the \(\alpha \) chosen yields such a \(q[h+\alpha h']\), discard this particular \(\alpha \) and choose again.

Table 1 The ineffectiveness of the modal midpoint as an estimate of the mode, or even of the modal midpoint itself

Similar to the conclusion of Theorem 1, the density \(q[h+\alpha h']\) (after normalization to obtain the corresponding element in \(\mathcal {Q}\)) gives the desired contradiction. \(\square \)

Experimental details

So as to allow comparison with Heinrich (2014), we consider a density \(p_{\text {mix}}\) which is a mixture of two Gaussians; letting \(p_1 = N(2,1.5)\) and \(p_2 = N(-2,0.5)\), where \(N(\mu ,\sigma )\) denotes a Gaussian density with mean \(\mu \) and standard deviation \(\sigma \), we set \(p_{\text {mix}}= 0.75 p_1 + 0.25 p_2\). The true mode of \(p_{\text {mix}}\) is \(m_0 = \varGamma _{\text {mode}}(p_{\text {mix}}) \approx -1.987047\), with the other local maximum occurring at \(m_1 \approx 2.000000\). The experiment performed is analogous to Heinrich (2014): For each value of \(\varepsilon \) as shown in Table 1, and in each of 1000 trials, we collect \(n=10,000\) independent samples from \(p_{\text {mix}}\), and measure the performance of the empirical modal midpoint \({\hat{x}}_\varepsilon \) relative to the true mode \(m_0\) and true modal midpoint \(x_\varepsilon = \varGamma _\varepsilon (p_{\text {mix}})\). In the case of a tie for \({\hat{x}}_\varepsilon \), we take the lowest value (which the reader will note should favor the correct value). In sum, our results are qualitatively similar to Heinrich (2014), in that the modal midpoint \({\hat{x}}_\varepsilon \) fails to estimate the mode, but we can also confirm that it fails to estimate the modal midpoint \(x_\varepsilon \) as well. Note in particular that the two “Versus local max” columns are identical.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dearborn, K., Frongillo, R. On the indirect elicitability of the mode and modal interval. Ann Inst Stat Math 72, 1095–1108 (2020). https://doi.org/10.1007/s10463-019-00719-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-019-00719-1

Keywords

Navigation