Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces,Optimization Letters

当前位置： X-MOL 学术 › Optim. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces
Optimization Letters ( IF 1.3 ) Pub Date : 2020-07-13 , DOI: 10.1007/s11590-020-01616-w
Diana C. Montañés , Adolfo J. Quiroz , Mateo Dulce Rubio , Alvaro J. Riascos Villegas

In the context of support vector machines, identifying the support vectors is a key issue when dealing with large data sets. In Camelo et al. (Ann Oper Res 235:85–101, 2015), the authors present a promising approach to finding or approximating most of the support vectors through a procedure based on sub-sampling and enriching the support vector sets by nearest neighbors. This method has been shown to improve the computational efficiency of support vector machines on large data sets with low or intermediate feature space dimension. In the present article we discuss ways of adapting the nearest neighbor enriching methodology to the context of very high dimensional data, such as text data or other high dimensional data types, for which nearest neighbor queries involve, in principle, a high computational cost. Our approach incorporates the proximity preserving order search algorithm of Chavez et al. (MICAI 2005: advances in artificial intelligence, Springer, Berlin, pp 405–414, 2005), into the nearest neighbor enriching method of Camelo et al. (2015), in order to adapt this procedure to the high dimension setting. For the required set of pivots, both random pivots and the base prototype pivot set of Micó et al. (Pattern Recogn Lett 15:9–17, 2015), are considered. The methodology proposed is evaluated on real data sets.

中文翻译：

高维特征空间中支持向量机的高效最近邻方法

在支持向量机的上下文中，识别支持向量是处理大型数据集时的关键问题。在卡梅洛等。（Ann Oper Res 235：85–101，2015），作者提出了一种有前途的方法，可通过基于子采样的程序来找到或逼近大多数支持向量，并通过最近邻居丰富支持向量集。该方法已被证明可以提高支持向量机在具有低或中等特征空间维的大型数据集上的计算效率。在本文中，我们讨论了使最邻近的富集方法适应于非常高维度的数据（例如文本数据或其他高维度的数据类型）的上下文的方法，就其而言，最邻近的查询在原则上涉及较高的计算成本。我们的方法结合了Chavez等人的邻近度保留顺序搜索算法。（MICAI 2005：人工智能的进步，柏林，施普林格，第405-414页，2005），进入了Camelo等人的最近邻富集方法。（2015），以使此程序适应高尺寸设置。对于所需的一组枢轴，Micó等人的随机枢轴和基础原型枢轴组均适用。（2015年，Recogn Lett模式15：9–17）。建议的方法是在真实数据集上评估的。Micó等人的随机支点和基本原型支点集。（Recogn Lett模式15：9-17，2015）。建议的方法是在真实数据集上评估的。Micó等人的随机支点和基本原型支点集。（Recogn Lett模式15：9-17，2015）。建议的方法是在真实数据集上评估的。

更新日期：2020-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11