Abstract
Many applications require the collection of data on different variables or measurements over many system performance metrics. We term those broadly as measures or variables. Often data collection along each measure incurs a cost, thus it is desirable to consider the cost of measures in modeling. This is a fairly new class of problems in the area of cost-sensitive learning. A few attempts have been made to incorporate costs in combining and selecting measures. However, existing studies either do not strictly enforce a budget constraint, or are not the ‘most’ cost effective. With a focus on classification problems, we propose a computationally efficient approach that could find a near optimal model under a given budget by exploring the most ‘promising’ part of the solution space. Instead of outputting a single model, we produce a model schedule—a list of models, sorted by model costs and expected predictive accuracy. This could be used to choose the model with the best predictive accuracy under a given budget, or to trade off between the budget and the predictive accuracy. Experiments on some benchmark datasets show that our approach compares favorably to competing methods.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Bartlett PL, Traskin M (2007) Adaboost is consistent. J Mach Learn Res 8:2347–2368
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman and Hall, London
Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the twenty-fifth international conference on machine learning (ICML), pp 96–103
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning (ICML)
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD, pp 785–794
Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):273–297
Delaigle A, Hall P, Jin J (2011) Robustness and accuracy of methods for high dimensional data analysis based on student’s t-statistic. J R Stat Soc Ser B 73(3):283–301
Donoho D, Jin J (2008) Higher criticism thresholding: optimal feature selection when useful features are rare and weak. Proc Natl Acad Sci USA 105(39):14790–14795
Efron B, Hastie T, Johnstone IM, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Elkan C (2001) The foundations of cost-sensitive learning. In: In Proceedings of the 17th international joint conference on artificial intelligence, pp 973–978
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning (ICML)
Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Friedman J, Hastie T, Tibshirani R (2010) Regulzrization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Greiner R, Grove A, Roth D (2002) Learning cost-sensitive active classifiers. Artif Intell 139(2):137–174
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Ji S, Carin L (2007) Cost-sensitive feature acquisition and classification. Pattern Recogn 40(5):1474–1485
Lichman M (2013) UC Irvine machine learning repository. http://archive.ics.uci.edu/ml
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, Berlin
Luenberger DG (2003) Linear and nonlinear programming. Springer, Berlin
Meinshausen N, Buhlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462
Min F, He H, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181(22):4928–4942
Nagaraju V, Yan D, Fiondella L (2018) A framework for selecting a subset of metrics considering cost. In: 24th ISSAT international conference on reliability and quality in design (RQD 2018)
O’Brien DB, Gupta MR, Gray, RM (2008) Cost-sensitive multi-class classiØcation from probability estimates. In: Proceedings of the 25th international conference on machine learning (ICML)
Park M, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69(4):659–677
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive. In: Proceedings of AAAI
Spackman KA (1989) Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the 6th international workshop on machine learning
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Aggarwal CC (ed) Data classification: algorithms and applications. Chapman and Hall, London
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288
Wang A, Bian X, Liu P, Yan D (2019) DC\(^{2}\): a divide-and-conquer algorithm for large-scale kernel learning with application to clustering. arXiv:1911.06944
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524
Wang X, Leng C (2016) High dimensional ordinary least squares projection for screening variables. J R Stat Soc Ser B 78(3):589–611
Yan D, Li C, Cong N, Yu L, Gong P (2019) A structured approach to the analysis of remote sensing images. Int J Remote Sens 40(20):7874–7897
Yan D, Wang Y, Wang J, Wu G, Wang H (2019) Fast communication-efficient spectral clustering over distributed data. arXiv:1905.01596
Yan D, Xu Y (2019) Learning over inherently distributed data. arXiv:1907.13208
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of IEEE international conference on data mining (ICDM)
Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 95:1–11
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–320
Acknowledgements
We thank the editors and the anonymous reviewers for their helpful comments and suggestions. This work was partially supported by the University Industry Collaborative Award (R32020110000000) from the University of Massachusetts Dartmouth.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, D., Qin, Z., Gu, S. et al. Cost-sensitive selection of variables by ensemble of model sequences. Knowl Inf Syst 63, 1069–1092 (2021). https://doi.org/10.1007/s10115-021-01551-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01551-x