当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating projections to kernel-induced spaces by feature approximation
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2020-05-26 , DOI: 10.1016/j.patrec.2020.05.029
Krzysztof Adamiak , Hyongsuk Kim , Krzysztof Ślot

A method for speeding-up data projections onto kernel-induced feature spaces (derived using e.g. kernel Principal Component Analysis - kPCA) is presented in the paper. The proposed idea is to simplify the derived features, implicitly defined by all training samples and dominant eigenvectors of problem-specific generalized eigenproblems, by appropriate approximations. Instead of employing the whole training set, we propose to use a small pool of its appropriately selected representatives and we formulate a rule for deriving the corresponding weight vectors that replace the considered dominant eigenvectors. The representatives are determined via clustering of training data, whereas weighting coefficients are chosen to minimize original feature approximation errors. The concept has been experimentally verified for kernel-PCA using both artificial and real datasets. It has been shown that the presented approach provides reduction in feature-extraction complexity, which implies a proportional increase in data projection speed, by one-to-two orders of magnitude, without sacrificing data analysis accuracy. Therefore, the proposed approach is well-suited for kernel-based, intelligent data analysis applications that are to be executed on resource-limited systems, such as embedded or IoT devices, or for systems where processing time is critical.



中文翻译:

通过特征逼近将投影加速到核诱发的空间

本文提出了一种加快数据投影到内核诱发的特征空间的方法(使用例如内核主成分分析-kPCA得出)。提出的想法是通过适当的近似来简化由所有训练样本和特定于问题的广义特征问题的主导特征向量隐式定义的导出特征。而不是使用整个训练集,我们建议使用其适当选择代表的一个小型游泳池,我们制定获得相应的权重向量是替代认为主要特征向量的规则。通过训练数据的聚类确定代表,而选择加权系数以最小化原始特征近似误差。该概念已使用人工和实际数据集通过实验验证用于PCA。已经表明,所提出的方法提供了特征提取复杂度的降低,这意味着数据投影速度成比例地增加了一到两个数量级,而不牺牲数据分析的准确性。因此,提出的方法非常适合将在资源受限的系统(例如嵌入式或IoT设备)上执行的基于内核的智能数据分析应用程序,或处理时间至关重要的系统。

更新日期:2020-05-26
down
wechat
bug