Sharpness Estimation of Combinatorial Generalization Ability Bounds for Threshold Decision Rules

Ishkina, Sh. Kh.; Vorontsov, K. V.

doi:10.1134/S0005117921050106

Sharpness Estimation of Combinatorial Generalization Ability Bounds for Threshold Decision Rules

INTELLECTUAL CONTROL SYSTEMS, DATA ANALYSIS
Published: 01 June 2021

Volume 82, pages 863–876, (2021)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

Sh. Kh. Ishkina¹ &
K. V. Vorontsov²

99 Accesses
2 Citations
Explore all metrics

Abstract

This article is devoted to the problem of calculating an exact upper bound for the functionals of the generalization ability of a family of one-dimensional threshold decision rules. An algorithm is investigated that solves the stated problem and is polynomial in the total number of samples used for training and validation and in the number of training samples. A theorem is proved for calculating an estimate for the functional of expected overfitting and an estimate for the error rate of the method for minimizing empirical risk on a validation set. The exact bounds calculated using the theorem are compared with the previously known quick-to-compute upper bounds so as to estimate the orders of overestimation of the bounds and to identify the bounds that could be used in real problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of the generalization ability of a full decision tree

Article 12 June 2014

Bounds on the moments for an ensemble of random decision trees

Article 19 July 2014

A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

REFERENCES

Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, Proc. Int. Joint Conf. Artif. Intell. (1995), pp. 1137–1143.
Vapnik, V.N. and Chervonenkis, A.Ya., On uniform convergence of the rates of occurrence of events to their probabilities, Teor. Veroyatn. Ee Primen., 1971, vol. 16, no. 2, pp. 264–280.
MATH Google Scholar
Vapnik, V.N. and Chervonenkis, A.Ya., Teoriya raspoznavaniya obrazov (Pattern Recognition Theory), Moscow: Nauka, 1974.
Google Scholar
Vapnik, V.N., Vosstanovlenie zavisimostei po empiricheskim dannym (Recovering Dependences from Empirical Data), Moscow: Nauka, 1979.
Google Scholar
Vorontsov, K.V., Combinatorial probability and the tightness of generalization bounds, Pattern Recognit. Image Anal., 2008, vol. 18, no. 2, pp. 243–259.
Article Google Scholar
Langford, J., Quantitatively tight sample complexity bounds, Ph.D. Thesis, Pittsburgh: Carnegie Mellon Univ.,2002.
Freund, Y. and Schapire, R.E., A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 1997, vol. 55, no. 1, pp. 119–139.
Article MathSciNet Google Scholar
Kearns, M.J. et.al., An experimental and theoretical comparison of model selection methods, in Computational Learning Theory, 1995, pp. 21–30.
Boucheron, S., Bousquet, O., and Lugosi, G., Theory of classification: A survey of some recent advances, ESAIM: Probab. Stat., 2005, vol. 9, pp. 323–375.
Article MathSciNet Google Scholar
Shalev-Shwartz, S. and Ben-David, S., Understanding Machine Learning: from Theory to Algorithms, Cambridge: Cambridge Univ. Press, 2014.
Book Google Scholar
Koltchinskii, V., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École d’Été de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Mathematics, Berlin–Heidelberg: Springer-Verlag, 2011.
Book Google Scholar
Vorontsov, K.V., Tight bounds for the probability of overfitting, Dokl. Math., 2009, vol. 80, p. 793.
Article MathSciNet Google Scholar
Vorontsov, K.V. and Ivahnenko, A.A., Tight combinatorial generalization bounds for threshold conjunction rules, in 4th Int. Conf. on Pattern Recognition and Machine Intelligence, 2011. Lecture Notes in Computer Science, Berlin–Heidelberg: Springer-Verlag, 2011, pp. 66–73.
Vorontsov, K.V., Splitting and similarity phenomena in the sets of classifiers and their effect on the probability of overfitting, Pattern Recognit. Image Anal., 2009, vol. 19, no. 3, pp. 412–420.
Article MathSciNet Google Scholar
Zhivotovskii, N.K. and Vorontsov, K.V., Criteria of tightness of combinatorial estimates of generalization ability, in Intellektualizatsiya obrabotki informatsii (IOI-2012) (Intellectualization of Information Processing (IIP-2012)), Moscow: Torus Press, 2012, pp. 25–28.
Vorontsov, K.V., Frei, A.I., and Sokolov, E.A., Countable combinatorial estimates of the probability of overfitting, Mash. Obuchenie Anal. Dannykh, 2013, vol. 1, no. 6, pp. 734–743.
Google Scholar
Frei, A.I. and Tolstikhin, I.O., Combinatorial estimates of the probability of overfitting based on clusterization and covers of the set of algorithms, Mash. Obuchenie Anal. Dannykh, 2013, vol. 1, no. 6, pp. 761–778.
Google Scholar
Haussler, D., Littlestone, N., and Warmuth, M.K., Predicting $0, 1 $-functions on randomly drawn points, Inf. Comput., 1994, vol. 115, no. 2, pp. 248–292.
Article MathSciNet Google Scholar
Vorontsov, K.V., Exact combinatorial bounds on the probability of overfitting for empirical risk minimization, Pattern Recognit. Image Anal., 2010, vol. 20, no. 3, pp. 269–285.
Article Google Scholar
Botov, P.V., Exact estimates of the probability of overfitting for multidimensional modeling families of algorithms, Pattern Recognit. Image Anal., 2010, vol. 20, no. 4, pp. 52–65.
Google Scholar
Tolstikhin, I.O., Probability of overfitting of some sparse families of algorithms, Mezhdunar. konf. IOI-8 (Int. Conf. IIP-8) (2008), Moscow: MAKS Press, 2010, pp. 83–86.
Frei, A.I., Accurate estimates of the generalization ability for symmetric set of predictors and randomized learning algorithms, Pattern Recognit. Image Anal., 2010, vol. 20, no. 3, pp. 241–250.
Article Google Scholar
Botov, P.V., Exact bounds on probability of overfitting for monotone and unimodal families of algorithms, Matematicheskie metody raspoznavaniya obrazov-14 (Mathematical Methods of Pattern Recognition-14), Moscow: MAKS Press, 2009, pp. 7–10.
Ishkina, Sh.Kh., Combinatorial bounds of overfitting for threshold classifiers, Ufa Math. J., 2018, vol. 10, no. 1, pp. 49–63.
Article MathSciNet Google Scholar
Zhuravlev, Yu.I., Ryazanov, V.V., and Sen’ko, O.V., Raspoznavanie. Matematicheskie metody. Programnaya sistema. Prakticheskie primeneniya (Recognition. Mathematical Methods. Program System. Practical Applications), Moscow: FAZIS, 2005.
Google Scholar
Zhuravlev, Yu.I., On an algebraic approach to solving recognition or classification problems, Probl. Kibern., 1978, vol. 33, pp. 5–68.
Google Scholar
Guz, I.S., Constructive estimates of complete sliding control for threshold classification, Mat. Biol. Bioinf., 2011, vol. 6, no. 2, pp. 173–189.
Article Google Scholar
GiHub Project https://github.com/shaurushka/theshold-clfs-gen-bound.
Vorontsov, K.V., Combinatorial theory of overfitting: how connectivity and splitting reduces the local complexity, 9th IFIP WG 12.5 Int. Conf., AIAI , Berlin–Heidelberg: Springer, 2013.
Hoeffding, W., Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 1963, no. 58, pp. 13–30.

Download references

ACKNOWLEDGMENTS

The authors are deeply grateful to the referees for careful consideration and valuable comments, which were taken into account during editing and contributed to the improvement of the presentation.

Funding

This work was supported by a grant from the Government of the Russian Federation aimed at the support of scientific research guided by leading scientists, project no. 075-15-2019-1926.

Author information

Authors and Affiliations

Dorodnicyn Computing Centre, Russian Academy of Sciences, Moscow, 119333, Russia
Sh. Kh. Ishkina
Moscow Institute of Physics and Technology, Dolgoprudnyi, Moscow oblast, 141700, Russia
K. V. Vorontsov

Authors

Sh. Kh. Ishkina
View author publications
You can also search for this author in PubMed Google Scholar
K. V. Vorontsov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sh. Kh. Ishkina or K. V. Vorontsov.

Additional information

Translated by V. Potapchouck

APPENDIX

Proof of Theorem 5. Let us write the expected overfitting formula and permute summation signs in it,

$$ EOF = {\binom L\ell }^{-1}\thinspace \sum _{X \in \left [\mathbb {X}\right ]^l}\thinspace \sum _{p=0}^{P} \thinspace \left [\mu X = a_p\right ] \: \delta \left (a_p, X\right ) = {\binom L\ell }^{-1} \thinspace \sum _{p = 0}^{P}\thinspace \sum _{X \in \left [\mathbb {X}\right ]^l}\thinspace \left [\mu X = a_p\right ] \: \delta (a_p, X). $$

Consider the set of partitions $(X, \bar {X})$ with fixed values of $t $ and $e $,

$$ t = |X \cap \mathbb {D}|, \quad e = n(a_p, X \cap \mathbb {D}).$$

The set of admissible values of $(t, e)$ is $\Psi _p $ according to (8).

Set $s = n(a_p, X \cap \mathbb {N}) $. The constraints $s + t\leqslant l $ and $s \leqslant m $ imply the upper bound for the parameter $s $ in (7).

Since the number of classifier errors $a_p $ on the validation sample is

$$ n\left (a_p, \bar {X}\right ) = n\left (a_p, \mathbb {X}\right ) - n\left (a_p, X\right )=n\left (a_p, \mathbb {X}\right ) - n\left (a_p, X \cap \mathbb {D}\right ) - n\left (a_p, X \cap \mathbb {N}\right ),$$

the overfitting of the classifier $a_p $ for given $s $ and $e $ can be represented in the form

$$ \delta \left (a_p, X\right ) = \frac {1}{L - \ell }\thinspace n\left (a_p, \bar {X}\right ) - \frac {1}{\ell }\thinspace n\left (a_p, X\right ) = \frac {1}{L - \ell }\thinspace \left (n\left (a_p, \mathbb {X}\right ) - \left (s + e\right )\right ) - \frac {1}{\ell } \thinspace \left (s + e\right ).$$

It was proved in [24] that whether the condition $\mu X =a_p$ is satisfied is independent of the choice of partitions of the set $\mathbb {N} $, and it was shown that the number of partitions of the set $\mathbb {N}$ for given $t $ and $s $ is equal to $\binom ms \binom {L - P - m}{\ell - t - s}$; this implies the statement of the theorem. The proof of Theorem 5 is complete. $\quad \blacksquare $

Proof of Theorem 6. Using the Hoeffding inequality [30], with probability $\varepsilon $ one can estimate the deviation $\eta = \eta (\varepsilon ) $ of overfitting from its expectation,

$$ \delta \left (\mu X, \bar {X}\right ) \leq EOF\left (\mu \right ) + \eta \left (\varepsilon \right ), $$

where the deviation is $\eta = \sqrt {-2 \ln \tfrac {\varepsilon }{2}}$.

Then the error rate of the classifier selected by the PERM algorithm on the validation sample can be estimated directly via the error rate on the learning sample and the overfitting expectation,

$$ \nu \left (\mu X, \bar {X}\right ) = \nu \left (\mu X, \bar {X}\right ) + \delta \left (\mu X, \bar {X}\right ) \leq \nu \left (\mu X, X\right ) + EOF\left (\mu \right ) + \eta \left (\varepsilon \right );$$

this justifies inequality (9). The latter and Lemmas 1 and 2 imply inequality (10). The proof of Theorem 6 is complete. $\quad \blacksquare $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ishkina, S.K., Vorontsov, K.V. Sharpness Estimation of Combinatorial Generalization Ability Bounds for Threshold Decision Rules. Autom Remote Control 82, 863–876 (2021). https://doi.org/10.1134/S0005117921050106

Download citation

Received: 29 June 2020
Revised: 09 December 2020
Accepted: 15 January 2021
Published: 01 June 2021
Issue Date: May 2021
DOI: https://doi.org/10.1134/S0005117921050106

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions