当前位置: X-MOL 学术Comput. Optim. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An effective procedure for feature subset selection in logistic regression based on information criteria
Computational Optimization and Applications ( IF 1.6 ) Pub Date : 2021-06-17 , DOI: 10.1007/s10589-021-00288-1
Enrico Civitelli , Matteo Lapucci , Fabio Schoen , Alessio Sortino

In this paper, the problem of best subset selection in logistic regression is addressed. In particular, we take into account formulations of the problem resulting from the adoption of information criteria, such as AIC or BIC, as goodness-of-fit measures. There exist various methods to tackle this problem. Heuristic methods are computationally cheap, but are usually only able to find low quality solutions. Methods based on local optimization suffer from similar limitations as heuristic ones. On the other hand, methods based on mixed integer reformulations of the problem are much more effective, at the cost of higher computational requirements, that become unsustainable when the problem size grows. We thus propose a new approach, which combines mixed-integer programming and decomposition techniques in order to overcome the aforementioned scalability issues. We provide a theoretical characterization of the proposed algorithm properties. The results of a vast numerical experiment, performed on widely available datasets, show that the proposed method achieves the goal of outperforming state-of-the-art techniques.



中文翻译:

基于信息标准的逻辑回归中特征子集选择的有效程序

在本文中,解决了逻辑回归中最佳子集选择的问题。特别是,我们考虑了由于采用信息标准(例如 AIC 或 BIC)而导致的问题的表述,作为拟合优度度量。存在各种方法来解决这个问题。启发式方法在计算上很便宜,但通常只能找到低质量的解决方案。基于局部优化的方法与启发式方法有类似的局限性。另一方面,基于问题的混合整数重构的方法更有效,但代价是更高的计算要求,随着问题规模的增长而变得不可持续。因此,我们提出了一种新方法,它结合了混合整数编程和分解技术,以克服上述可扩展性问题。我们提供了所提出算法特性的理论表征。在广泛可用的数据集上进行的大量数值实验的结果表明,所提出的方法实现了优于最先进技术的目标。

更新日期:2021-06-18
down
wechat
bug