Abstract
Optimal margin distribution machine (ODM) is an efficient algorithm for classification problems. ODM attempts to optimize the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously, so it can achieve a better generalization performance. However, it is relatively time-consuming for large-scale problems. In this paper, we propose a hinge loss-based optimal margin distribution machine (Hinge-ODM), which derives a simplified substitute formulation. It can speed up the solving process without affecting the optimal accuracy obviously. Besides, inspired by its sparse solution, we put forward a multi-parameter safe screening rule for Hinge-ODM, called MSSR-Hinge-ODM. Based on the MSSR, most non-support vectors can be identified and deleted beforehand so the scale of dual problem will be greatly reduced. Moreover, our MSSR is safe, that is, it can get the exactly same optimal solutions as the original one. Furthermore, a fast algorithm DCDM is introduced to further solve the reduced Hinge-ODM. Finally, we integrate the MSSR into grid search method to accelerate the whole training process. Experimental results on twenty data sets demonstrate the superiority of the proposed methods.
Similar content being viewed by others
References
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Cristianin N, Shawe-Taylar J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Deng N, Tian Y, Zhang C (2012) Support vector machines: Optimization based theory, algorithms, and extensions. CRC Press, Philadelphia
Zhou Z, methods Ensemble (2012) Ensemble methods: Foundations and algorithms. CRC Press, Boca Raton
Schapire RE, Freund Y, Barlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686
Reyzin L, Schapire RE (2006) How boosting the margin can also boost classifier complexity. In: Proceeding of 23rd international conference on machine learning, Pittsburgh, PA , pp 753–760
Gao W, Zhou Z (2013) On the doubt about margin explanation of boosting. Artif Intell 203:1–18
Zhou Z (2014) Large margin distribution learning. In: Proceedings of the 6th IAPR international workshop on artificial neural networks in pattern recognition. Montreal, Canada, pp 1–11
Zhang T, Zhou Z (2014) Large margin distribution machine. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery data mining, pp 313–322
Zhang T, Zhou Z (2020) Optimal margin distribution machine. IEEE Trans Knowl Data Eng 32(6):1143–1156
Zhou Y, Zhou Z (2016) Large margin distribution learning with cost interval and unlabeled data. IEEE Trans Knowl Data Eng 28(7):1749–1763
Zhang T, Zhou Z (2018) Optimal margin distribution clustering. In: Proceedings of the 32nd AAAI conference on artificial intelligence, pp 4474–4481. New Orleans, LA
Tan Z, Tan P, Jiang Y, Zhou Z (2020) Multi-label optimal margin distribution machine. Mach Learn 109(3):623–642
Guo C, Deng H, Chen H (2020) Optimal margin distribution additive machine. IEEE Access 8:128043–128049
Luan T, Luo T, Zhuge W (2020) Optimal representative distribution margin machine for multi-instance learning. IEEE Access 8:74864–74874
Zhang X, Wang D, Zhou Y (2019) Kernel modified optimal margin distribution machine for imbalanced data classification. Pattern Recogn Lett 125:325–332
Ou G, Wang Y, Pang W, Coghill GM (2017) Large margin distribution machine recursive feature elimination. In: The 4th international conference on systems and informatics (ICSAI), pp 1518–1523. Hangzhou, China
Hsieh C, Chang K, Lin C, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear svm. Proceedings of the 25th International conference on machine learning, pp 408–415, Helsinki, Finland
Mohamad M, Selamat A, Krejcar O, Fujita H, Wu T (2020) An analysis on new hybrid parameter selection model performance over big data set. Knowledge Based Systems 192:105441
Ghaoui LE, Viallon V, Rabbani T (2010) Safe feature elimination in sparse supervised learning. Pacific Journal of Optimization 8(4):667–698
Xiang ZJ, Ramadge PJ (2012) Fast lasso screening tests based on correlations. IEEE International conference on acoustics speech and signal processing, pp 2137–2140, Kyoto, Japan
Wang J, Zhou J, Liu J, Wonka P, Ye J (2014) A safe screening rule for sparse logistic regression. Advances in Neural Information Processing Systems 27:1053–1061. Montreal, Canada
E Ndiaye, Fercoq O, Gramfort A, Salmon J (2016) Gap safe screening rules for sparse-group-lasso, vol 29. Barcelona, Spain
Ogawa K, Suzuki Y, Takeuchi I (2013) Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th international conference on machine learning, pp 1382–1390, Atlanta USA
Wang J, Wonka P, Ye J (2014) Scaling svm and least absolute deviations via exact data reduction. In Proceedings of the 31th international conference on machine learning, pp 1912–1927, Beijing, China
Jin Z, Ying Z, Wei L (2001) A simple resampling method by perturbing the minimand. Biometrika 88 (2):381–390
Buchinsky M (1998) Recent advances in quantile regression models. Journal of Human Resources 27(1):88–126
Pan X, Yang Z, Xu Y, Wang L (2018) Safe screening rules for accelerating twin support vector machine classification. IEEE Transactions on Neural Networks and Learning Systems 29(5): 1876–1887
Zhao J, Xu Y, Fujita H (2019) An improved non-parallel universum support vector machine and its safe sample screening rule. Knowledge Based Systems 170:79–88
Pang X, Pan X, Xu Y (2019) Multi-parameter safe sample elimination rule for accelerating nonlinear multi-class support vector machines. Pattern Recogn 95:1–11
Wang H, Pan X, Xu Y (2019) Simultaneous safe feature and sample elimination for sparse support vector regression. IEEE Trans Signal Process 67(15):4043–4054
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
Preiss D (1984) Gateaux differentiable functions are somewhere Frechet differentiable. Rendiconti del Circolo Matematico di Palermo 33(1):122–133
Güler O (2010) Foundations of optimization. Springer, New York
Khozeimeh F, Alizadehsani R, Roshanzamir M (2017) An expert system for selecting wart treatment method. Comput Biol Med 81:167–175
Baudry J, Cardoso M, Celeux G (2015) Enhancing the selection of a model-based clustering with external categorical variables. ADAC 9(2):177–196
Ramana BV, Babu MSP, Venkateswarlu NB (2011) A critical study of selected classification algorithms for liver disease diagnosis. International Journal of Database Management Systems 3(2):101–114
Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med Phys 34(11):4164–4172
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No. 12071475,11671010) and Beijing Natural Science Foundation (No.4172035). The authors would like to thank the reviewers for the helpful comments and suggestions, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, M., Xu, Y. Multi-parameter safe screening rule for hinge-optimal margin distribution machine. Appl Intell 51, 2279–2290 (2021). https://doi.org/10.1007/s10489-020-02024-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02024-4