Abstract
This article is devoted to the problem of calculating an exact upper bound for the functionals of the generalization ability of a family of one-dimensional threshold decision rules. An algorithm is investigated that solves the stated problem and is polynomial in the total number of samples used for training and validation and in the number of training samples. A theorem is proved for calculating an estimate for the functional of expected overfitting and an estimate for the error rate of the method for minimizing empirical risk on a validation set. The exact bounds calculated using the theorem are compared with the previously known quick-to-compute upper bounds so as to estimate the orders of overestimation of the bounds and to identify the bounds that could be used in real problems.
Similar content being viewed by others
REFERENCES
Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, Proc. Int. Joint Conf. Artif. Intell. (1995), pp. 1137–1143.
Vapnik, V.N. and Chervonenkis, A.Ya., On uniform convergence of the rates of occurrence of events to their probabilities, Teor. Veroyatn. Ee Primen., 1971, vol. 16, no. 2, pp. 264–280.
Vapnik, V.N. and Chervonenkis, A.Ya., Teoriya raspoznavaniya obrazov (Pattern Recognition Theory), Moscow: Nauka, 1974.
Vapnik, V.N., Vosstanovlenie zavisimostei po empiricheskim dannym (Recovering Dependences from Empirical Data), Moscow: Nauka, 1979.
Vorontsov, K.V., Combinatorial probability and the tightness of generalization bounds, Pattern Recognit. Image Anal., 2008, vol. 18, no. 2, pp. 243–259.
Langford, J., Quantitatively tight sample complexity bounds, Ph.D. Thesis, Pittsburgh: Carnegie Mellon Univ.,2002.
Freund, Y. and Schapire, R.E., A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 1997, vol. 55, no. 1, pp. 119–139.
Kearns, M.J. et.al., An experimental and theoretical comparison of model selection methods, in Computational Learning Theory, 1995, pp. 21–30.
Boucheron, S., Bousquet, O., and Lugosi, G., Theory of classification: A survey of some recent advances, ESAIM: Probab. Stat., 2005, vol. 9, pp. 323–375.
Shalev-Shwartz, S. and Ben-David, S., Understanding Machine Learning: from Theory to Algorithms, Cambridge: Cambridge Univ. Press, 2014.
Koltchinskii, V., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École d’Été de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Mathematics, Berlin–Heidelberg: Springer-Verlag, 2011.
Vorontsov, K.V., Tight bounds for the probability of overfitting, Dokl. Math., 2009, vol. 80, p. 793.
Vorontsov, K.V. and Ivahnenko, A.A., Tight combinatorial generalization bounds for threshold conjunction rules, in 4th Int. Conf. on Pattern Recognition and Machine Intelligence, 2011. Lecture Notes in Computer Science, Berlin–Heidelberg: Springer-Verlag, 2011, pp. 66–73.
Vorontsov, K.V., Splitting and similarity phenomena in the sets of classifiers and their effect on the probability of overfitting, Pattern Recognit. Image Anal., 2009, vol. 19, no. 3, pp. 412–420.
Zhivotovskii, N.K. and Vorontsov, K.V., Criteria of tightness of combinatorial estimates of generalization ability, in Intellektualizatsiya obrabotki informatsii (IOI-2012) (Intellectualization of Information Processing (IIP-2012)), Moscow: Torus Press, 2012, pp. 25–28.
Vorontsov, K.V., Frei, A.I., and Sokolov, E.A., Countable combinatorial estimates of the probability of overfitting, Mash. Obuchenie Anal. Dannykh, 2013, vol. 1, no. 6, pp. 734–743.
Frei, A.I. and Tolstikhin, I.O., Combinatorial estimates of the probability of overfitting based on clusterization and covers of the set of algorithms, Mash. Obuchenie Anal. Dannykh, 2013, vol. 1, no. 6, pp. 761–778.
Haussler, D., Littlestone, N., and Warmuth, M.K., Predicting \(0, 1 \)-functions on randomly drawn points, Inf. Comput., 1994, vol. 115, no. 2, pp. 248–292.
Vorontsov, K.V., Exact combinatorial bounds on the probability of overfitting for empirical risk minimization, Pattern Recognit. Image Anal., 2010, vol. 20, no. 3, pp. 269–285.
Botov, P.V., Exact estimates of the probability of overfitting for multidimensional modeling families of algorithms, Pattern Recognit. Image Anal., 2010, vol. 20, no. 4, pp. 52–65.
Tolstikhin, I.O., Probability of overfitting of some sparse families of algorithms, Mezhdunar. konf. IOI-8 (Int. Conf. IIP-8) (2008), Moscow: MAKS Press, 2010, pp. 83–86.
Frei, A.I., Accurate estimates of the generalization ability for symmetric set of predictors and randomized learning algorithms, Pattern Recognit. Image Anal., 2010, vol. 20, no. 3, pp. 241–250.
Botov, P.V., Exact bounds on probability of overfitting for monotone and unimodal families of algorithms, Matematicheskie metody raspoznavaniya obrazov-14 (Mathematical Methods of Pattern Recognition-14), Moscow: MAKS Press, 2009, pp. 7–10.
Ishkina, Sh.Kh., Combinatorial bounds of overfitting for threshold classifiers, Ufa Math. J., 2018, vol. 10, no. 1, pp. 49–63.
Zhuravlev, Yu.I., Ryazanov, V.V., and Sen’ko, O.V., Raspoznavanie. Matematicheskie metody. Programnaya sistema. Prakticheskie primeneniya (Recognition. Mathematical Methods. Program System. Practical Applications), Moscow: FAZIS, 2005.
Zhuravlev, Yu.I., On an algebraic approach to solving recognition or classification problems, Probl. Kibern., 1978, vol. 33, pp. 5–68.
Guz, I.S., Constructive estimates of complete sliding control for threshold classification, Mat. Biol. Bioinf., 2011, vol. 6, no. 2, pp. 173–189.
GiHub Project https://github.com/shaurushka/theshold-clfs-gen-bound.
Vorontsov, K.V., Combinatorial theory of overfitting: how connectivity and splitting reduces the local complexity, 9th IFIP WG 12.5 Int. Conf., AIAI , Berlin–Heidelberg: Springer, 2013.
Hoeffding, W., Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 1963, no. 58, pp. 13–30.
ACKNOWLEDGMENTS
The authors are deeply grateful to the referees for careful consideration and valuable comments, which were taken into account during editing and contributed to the improvement of the presentation.
Funding
This work was supported by a grant from the Government of the Russian Federation aimed at the support of scientific research guided by leading scientists, project no. 075-15-2019-1926.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated by V. Potapchouck
APPENDIX
Proof of Theorem 5. Let us write the expected overfitting formula and permute summation signs in it,
Consider the set of partitions \((X, \bar {X})\) with fixed values of \(t \) and \(e \),
The set of admissible values of \((t, e)\) is \(\Psi _p \) according to (8).
Set \(s = n(a_p, X \cap \mathbb {N}) \). The constraints \(s + t\leqslant l \) and \(s \leqslant m \) imply the upper bound for the parameter \(s \) in (7).
Since the number of classifier errors \(a_p \) on the validation sample is
It was proved in [24] that whether the condition \(\mu X =a_p\) is satisfied is independent of the choice of partitions of the set \(\mathbb {N} \), and it was shown that the number of partitions of the set \(\mathbb {N}\) for given \(t \) and \(s \) is equal to \(\binom ms \binom {L - P - m}{\ell - t - s}\); this implies the statement of the theorem. The proof of Theorem 5 is complete. \(\quad \blacksquare \)
Proof of Theorem 6. Using the Hoeffding inequality [30], with probability \(\varepsilon \) one can estimate the deviation \(\eta = \eta (\varepsilon ) \) of overfitting from its expectation,
Then the error rate of the classifier selected by the PERM algorithm on the validation sample can be estimated directly via the error rate on the learning sample and the overfitting expectation,
Rights and permissions
About this article
Cite this article
Ishkina, S.K., Vorontsov, K.V. Sharpness Estimation of Combinatorial Generalization Ability Bounds for Threshold Decision Rules. Autom Remote Control 82, 863–876 (2021). https://doi.org/10.1134/S0005117921050106
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117921050106