当前位置: X-MOL 学术Eur. J. Oper. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dendrograms, minimum spanning trees and feature selection
European Journal of Operational Research ( IF 6.4 ) Pub Date : 2022-11-24 , DOI: 10.1016/j.ejor.2022.11.031
Martine Labbé, Mercedes Landete, Marina Leal

Feature selection is a fundamental process to avoid overfitting and to reduce the size of databases without significant loss of information that applies to hierarchical clustering. Dendrograms are graphical representations of hierarchical clustering algorithms that for single linkage clustering can be interpreted as minimum spanning trees in the complete network defined by the database. In this work, we introduce the problem that determines jointly a set of features and a dendrogram, according to the single linkage method. We propose different formulations that include the minimum spanning tree problem constraints as well as the feature selection constraints. Different bounds on the objective function are studied. For one of the models, several families of valid inequalities are proposed and the problem of separating them is studied. For another formulation, a decomposition algorithm is designed. In an extensive computational study, the effectiveness of the different models is discussed, the model with valid inequalities is compared with the decomposition algorithm. The computational results also illustrate that the integration of feature selection to the optimization model allows to keep a satisfactory percentage of information.



中文翻译:

树状图、最小生成树和特征选择

特征选择是避免过度拟合和减小数据库大小而不会显着丢失适用于层次聚类的信息的基本过程。树状图是层次聚类算法的图形表示,对于单链接聚类,可以将其解释为数据库定义的完整网络中的最小生成树。在这项工作中,我们介绍了根据单链接方法共同确定一组特征和树状图的问题。我们提出了不同的公式,包括最小生成树问题约束以及特征选择约束。研究了目标函数的不同界限。对于其中一个模型,提出了几个有效不等式族,并研究了将它们分离的问题。对于另一种公式,设计了分解算法。在广泛的计算研究中,讨论了不同模型的有效性,并将具有有效不等式的模型与分解算法进行了比较。计算结果还表明,将特征选择集成到优化模型中可以保持令人满意的信息百分比。

更新日期:2022-11-24
down
wechat
bug