Skip to main content
Log in

Sharpness Estimation of Combinatorial Generalization Ability Bounds for Threshold Decision Rules

  • INTELLECTUAL CONTROL SYSTEMS, DATA ANALYSIS
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

This article is devoted to the problem of calculating an exact upper bound for the functionals of the generalization ability of a family of one-dimensional threshold decision rules. An algorithm is investigated that solves the stated problem and is polynomial in the total number of samples used for training and validation and in the number of training samples. A theorem is proved for calculating an estimate for the functional of expected overfitting and an estimate for the error rate of the method for minimizing empirical risk on a validation set. The exact bounds calculated using the theorem are compared with the previously known quick-to-compute upper bounds so as to estimate the orders of overestimation of the bounds and to identify the bounds that could be used in real problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Similar content being viewed by others

REFERENCES

  1. Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, Proc. Int. Joint Conf. Artif. Intell. (1995), pp. 1137–1143.

  2. Vapnik, V.N. and Chervonenkis, A.Ya., On uniform convergence of the rates of occurrence of events to their probabilities, Teor. Veroyatn. Ee Primen., 1971, vol. 16, no. 2, pp. 264–280.

    MATH  Google Scholar 

  3. Vapnik, V.N. and Chervonenkis, A.Ya., Teoriya raspoznavaniya obrazov (Pattern Recognition Theory), Moscow: Nauka, 1974.

    Google Scholar 

  4. Vapnik, V.N., Vosstanovlenie zavisimostei po empiricheskim dannym (Recovering Dependences from Empirical Data), Moscow: Nauka, 1979.

    Google Scholar 

  5. Vorontsov, K.V., Combinatorial probability and the tightness of generalization bounds, Pattern Recognit. Image Anal., 2008, vol. 18, no. 2, pp. 243–259.

    Article  Google Scholar 

  6. Langford, J., Quantitatively tight sample complexity bounds, Ph.D. Thesis, Pittsburgh: Carnegie Mellon Univ.,2002.

  7. Freund, Y. and Schapire, R.E., A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 1997, vol. 55, no. 1, pp. 119–139.

    Article  MathSciNet  Google Scholar 

  8. Kearns, M.J. et.al., An experimental and theoretical comparison of model selection methods, in Computational Learning Theory, 1995, pp. 21–30.

  9. Boucheron, S., Bousquet, O., and Lugosi, G., Theory of classification: A survey of some recent advances, ESAIM: Probab. Stat., 2005, vol. 9, pp. 323–375.

    Article  MathSciNet  Google Scholar 

  10. Shalev-Shwartz, S. and Ben-David, S., Understanding Machine Learning: from Theory to Algorithms, Cambridge: Cambridge Univ. Press, 2014.

    Book  Google Scholar 

  11. Koltchinskii, V., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École d’Été de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Mathematics, Berlin–Heidelberg: Springer-Verlag, 2011.

    Book  Google Scholar 

  12. Vorontsov, K.V., Tight bounds for the probability of overfitting, Dokl. Math., 2009, vol. 80, p. 793.

    Article  MathSciNet  Google Scholar 

  13. Vorontsov, K.V. and Ivahnenko, A.A., Tight combinatorial generalization bounds for threshold conjunction rules, in 4th Int. Conf. on Pattern Recognition and Machine Intelligence, 2011. Lecture Notes in Computer Science, Berlin–Heidelberg: Springer-Verlag, 2011, pp. 66–73.

  14. Vorontsov, K.V., Splitting and similarity phenomena in the sets of classifiers and their effect on the probability of overfitting, Pattern Recognit. Image Anal., 2009, vol. 19, no. 3, pp. 412–420.

    Article  MathSciNet  Google Scholar 

  15. Zhivotovskii, N.K. and Vorontsov, K.V., Criteria of tightness of combinatorial estimates of generalization ability, in Intellektualizatsiya obrabotki informatsii (IOI-2012) (Intellectualization of Information Processing (IIP-2012)), Moscow: Torus Press, 2012, pp. 25–28.

  16. Vorontsov, K.V., Frei, A.I., and Sokolov, E.A., Countable combinatorial estimates of the probability of overfitting, Mash. Obuchenie Anal. Dannykh, 2013, vol. 1, no. 6, pp. 734–743.

    Google Scholar 

  17. Frei, A.I. and Tolstikhin, I.O., Combinatorial estimates of the probability of overfitting based on clusterization and covers of the set of algorithms, Mash. Obuchenie Anal. Dannykh, 2013, vol. 1, no. 6, pp. 761–778.

    Google Scholar 

  18. Haussler, D., Littlestone, N., and Warmuth, M.K., Predicting \(0, 1 \)-functions on randomly drawn points, Inf. Comput., 1994, vol. 115, no. 2, pp. 248–292.

    Article  MathSciNet  Google Scholar 

  19. Vorontsov, K.V., Exact combinatorial bounds on the probability of overfitting for empirical risk minimization, Pattern Recognit. Image Anal., 2010, vol. 20, no. 3, pp. 269–285.

    Article  Google Scholar 

  20. Botov, P.V., Exact estimates of the probability of overfitting for multidimensional modeling families of algorithms, Pattern Recognit. Image Anal., 2010, vol. 20, no. 4, pp. 52–65.

    Google Scholar 

  21. Tolstikhin, I.O., Probability of overfitting of some sparse families of algorithms, Mezhdunar. konf. IOI-8 (Int. Conf. IIP-8) (2008), Moscow: MAKS Press, 2010, pp. 83–86.

  22. Frei, A.I., Accurate estimates of the generalization ability for symmetric set of predictors and randomized learning algorithms, Pattern Recognit. Image Anal., 2010, vol. 20, no. 3, pp. 241–250.

    Article  Google Scholar 

  23. Botov, P.V., Exact bounds on probability of overfitting for monotone and unimodal families of algorithms, Matematicheskie metody raspoznavaniya obrazov-14 (Mathematical Methods of Pattern Recognition-14), Moscow: MAKS Press, 2009, pp. 7–10.

  24. Ishkina, Sh.Kh., Combinatorial bounds of overfitting for threshold classifiers, Ufa Math. J., 2018, vol. 10, no. 1, pp. 49–63.

    Article  MathSciNet  Google Scholar 

  25. Zhuravlev, Yu.I., Ryazanov, V.V., and Sen’ko, O.V., Raspoznavanie. Matematicheskie metody. Programnaya sistema. Prakticheskie primeneniya (Recognition. Mathematical Methods. Program System. Practical Applications), Moscow: FAZIS, 2005.

    Google Scholar 

  26. Zhuravlev, Yu.I., On an algebraic approach to solving recognition or classification problems, Probl. Kibern., 1978, vol. 33, pp. 5–68.

    Google Scholar 

  27. Guz, I.S., Constructive estimates of complete sliding control for threshold classification, Mat. Biol. Bioinf., 2011, vol. 6, no. 2, pp. 173–189.

    Article  Google Scholar 

  28. GiHub Project https://github.com/shaurushka/theshold-clfs-gen-bound.

  29. Vorontsov, K.V., Combinatorial theory of overfitting: how connectivity and splitting reduces the local complexity, 9th IFIP WG 12.5 Int. Conf., AIAI , Berlin–Heidelberg: Springer, 2013.

  30. Hoeffding, W., Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 1963, no. 58, pp. 13–30.

Download references

ACKNOWLEDGMENTS

The authors are deeply grateful to the referees for careful consideration and valuable comments, which were taken into account during editing and contributed to the improvement of the presentation.

Funding

This work was supported by a grant from the Government of the Russian Federation aimed at the support of scientific research guided by leading scientists, project no. 075-15-2019-1926.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sh. Kh. Ishkina or K. V. Vorontsov.

Additional information

Translated by V. Potapchouck

APPENDIX

Proof of Theorem 5. Let us write the expected overfitting formula and permute summation signs in it,

$$ EOF = {\binom L\ell }^{-1}\thinspace \sum _{X \in \left [\mathbb {X}\right ]^l}\thinspace \sum _{p=0}^{P} \thinspace \left [\mu X = a_p\right ] \: \delta \left (a_p, X\right ) = {\binom L\ell }^{-1} \thinspace \sum _{p = 0}^{P}\thinspace \sum _{X \in \left [\mathbb {X}\right ]^l}\thinspace \left [\mu X = a_p\right ] \: \delta (a_p, X). $$

Consider the set of partitions \((X, \bar {X})\) with fixed values of \(t \) and \(e \),

$$ t = |X \cap \mathbb {D}|, \quad e = n(a_p, X \cap \mathbb {D}).$$

The set of admissible values of \((t, e)\) is \(\Psi _p \) according to (8).

Set \(s = n(a_p, X \cap \mathbb {N}) \). The constraints \(s + t\leqslant l \) and \(s \leqslant m \) imply the upper bound for the parameter \(s \) in (7).

Since the number of classifier errors \(a_p \) on the validation sample is

$$ n\left (a_p, \bar {X}\right ) = n\left (a_p, \mathbb {X}\right ) - n\left (a_p, X\right )=n\left (a_p, \mathbb {X}\right ) - n\left (a_p, X \cap \mathbb {D}\right ) - n\left (a_p, X \cap \mathbb {N}\right ),$$
the overfitting of the classifier \(a_p \) for given \(s \) and \(e \) can be represented in the form
$$ \delta \left (a_p, X\right ) = \frac {1}{L - \ell }\thinspace n\left (a_p, \bar {X}\right ) - \frac {1}{\ell }\thinspace n\left (a_p, X\right ) = \frac {1}{L - \ell }\thinspace \left (n\left (a_p, \mathbb {X}\right ) - \left (s + e\right )\right ) - \frac {1}{\ell } \thinspace \left (s + e\right ).$$

It was proved in [24] that whether the condition \(\mu X =a_p\) is satisfied is independent of the choice of partitions of the set \(\mathbb {N} \), and it was shown that the number of partitions of the set \(\mathbb {N}\) for given \(t \) and \(s \) is equal to \(\binom ms \binom {L - P - m}{\ell - t - s}\); this implies the statement of the theorem. The proof of Theorem 5 is complete. \(\quad \blacksquare \)

Proof of Theorem 6. Using the Hoeffding inequality [30], with probability \(\varepsilon \) one can estimate the deviation \(\eta = \eta (\varepsilon ) \) of overfitting from its expectation,

$$ \delta \left (\mu X, \bar {X}\right ) \leq EOF\left (\mu \right ) + \eta \left (\varepsilon \right ), $$
where the deviation is \(\eta = \sqrt {-2 \ln \tfrac {\varepsilon }{2}}\).

Then the error rate of the classifier selected by the PERM algorithm on the validation sample can be estimated directly via the error rate on the learning sample and the overfitting expectation,

$$ \nu \left (\mu X, \bar {X}\right ) = \nu \left (\mu X, \bar {X}\right ) + \delta \left (\mu X, \bar {X}\right ) \leq \nu \left (\mu X, X\right ) + EOF\left (\mu \right ) + \eta \left (\varepsilon \right );$$
this justifies inequality (9). The latter and Lemmas 1 and 2 imply inequality (10). The proof of Theorem 6 is complete. \(\quad \blacksquare \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ishkina, S.K., Vorontsov, K.V. Sharpness Estimation of Combinatorial Generalization Ability Bounds for Threshold Decision Rules. Autom Remote Control 82, 863–876 (2021). https://doi.org/10.1134/S0005117921050106

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0005117921050106

Keywords

Navigation