1 Correction to: Machine Learning (2019) 108:1261–1286 https://doi.org/10.1007/s10994-019-05795-1

There was a mistake in the proof of the optimal shrinkage intensity for our estimator presented in Section 3.1. The main theorem still holds, and the shrinkage intensity presented in the corrected version is the optimal in the sense of minimizing the mean squared error (MSE). In this document, apart from correcting the proof for the optimal shrinkage intensity, we provide empirical verification on the correctness via simulations. The third term of Theorem 1 needs to be corrected as follows:

$$ \begin{aligned} \widehat{\mathbb {E}}\left[ (\hat{p}^{\mathrm{Ind}}(xy))^2\right]&= \frac{1}{N^3}\bigg ( (N-1)(N-2)(N-3)\big ( \hat{p}^\mathrm{ML}(x) \hat{p}^\mathrm{ML}(y) \big )^2 \nonumber \\&\qquad \qquad + (N-1) (N-2)\hat{p}^\mathrm{ML}(x) \hat{p}^\mathrm{ML}(y) \big ((\hat{p}^{\mathrm{ML}}(x)+\hat{p}^{\mathrm{ML}}(y)+4\hat{p}^{\mathrm{ML}}(xy))\big ) \nonumber \\&\qquad \qquad +(N-1)\big (2\hat{p}^{\mathrm{ML}}(xy)(\hat{p}^{\mathrm{ML}}(x)+\hat{p}^{\mathrm{ML}}(y))+2(\hat{p}^{\mathrm{ML}}(xy))^2\nonumber \\&\qquad \qquad +\hat{p}^\mathrm{ML}(x) \hat{p}^\mathrm{ML}(y)\big ) + \hat{p}^{\mathrm{ML}}(xy) \bigg ). \end{aligned} $$
(1)

Parts of supplementary material’s pages 4–6, where the above term is derived, need the following corrections. In page 4 the term A(xy) needs to be corrected as follows:

$$ \begin{aligned} {A(xy)} ={\sum _{\begin{array}{c} x',x'' \in \mathcal {X}\\ x'\ne x'' \ne x \end{array}}\sum _{\begin{array}{c} y', y'' \in \mathcal {Y}\\ y'\ne y'' \end{array}}{\mathbb {E}} \left[ {N_{xy'}N_{xy''} N_{x'y}N_{x''y}}\right] +2\sum _{\begin{array}{c} x' \in \mathcal {X}\\ x'\ne x \end{array}}\sum _{\begin{array}{c} y', y'' \in \mathcal {Y}\\ y'\ne y'' \ne y \end{array}}{\mathbb {E}} \left[ {N_{xy'}N_{xy''} N_{x'y}N_{xy}}\right] }. \end{aligned} $$

As a consequence in page 5 the same term needs correction:

$$ \begin{aligned} {A(xy)}=&{N^{(4)} \Bigg [\bigg (p(x)^2-\sum _{y' \in \mathcal {Y}}p(xy')^2\bigg )\bigg (p(y)^2-\sum _{x' \in \mathcal {X}}p(x'y)^2\bigg )}\\&{-4 \big (p(x)-p(xy)\big )p(xy)^2\big (p(y)-p(xy)\big )\Bigg ]}. \end{aligned} $$

Finally, the first equation in page 6 needs the following correction:

$$ \begin{aligned}&{\sum _{x',x'' \in \mathcal {X}}\sum _{y', y'' \in \mathcal {Y}}{\mathbb {E}} \left[ {N_{xy'}N_{xy''} N_{x'y}N_{x''y}}\right] =}{N^{(4)}p(x)^2p(y)^2}\\&\quad {+N^{(3)}p(x)p(y)(p(x)+p(y)+4p(xy))}\\&\quad {+N^{(2)}\big [2p(xy)(p(x)+p(y))+2p(xy)^2+p(x)p(y)\big ]}\\&\quad {+Np(xy),} \end{aligned} $$

which will result in the estimate for \( \widehat{\mathbb {E}}\left[ (\hat{p}^{\mathrm{Ind}}(xy))^2\right] \) presented in Eq. (1).

Apart from correcting the proof, we also provide some simulation results that validate the correctness of the optimal shrinkage intensity. To this end we followed the procedure described in the main paper’s Section 3.2, to generate probabilities that lead to different types of effect size, i.e. different population values for the mutual information I(XY). The squared error of our shrinkage estimator for the probabilities is defined as \( \sum _{x \in \mathcal {X}}\sum _{' \in \mathcal {Y}} \left( p(xy) - \hat{p}^{\mathrm{Ind-JS}}(xy) \right) ^{2}. \) We estimated the MSE by averaging over 1000 simulation runs. In Fig. 1 we present the results for three different effect sizes: I(X;Y) = 0.01, 0.05 and 0.15. In each graph we plot the MSE for all possible values of the shrinkage intensity [0, 1] and we also point out the optimal intensity using the corrected value \(\lambda ^{*}\) and the value we erroneously used in the previous version of the paper \( \lambda _{e}^{*} \). As we see, the corrected value leads to the minimum MSE.

Fig. 1
figure 1

Comparing the performance of our shrinkage estimator for different values of the shrinkage intensity in terms of MSE for three different effect sizes: a I(X;Y) = 0.01, b I(X;Y) = 0.05 and c I(X;Y) = 0.15. The vertical lines show the optimal shrinkage intensity presented in this correction (solid black line), and the intensity erroneously presented in the initial version of the paper (dashed red line)