当前位置: X-MOL 学术Chemometr. Intell. Lab. Systems › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Supervised projection pursuit – A dimensionality reduction technique optimized for probabilistic classification
Chemometrics and Intelligent Laboratory Systems ( IF 3.9 ) Pub Date : 2019-11-01 , DOI: 10.1016/j.chemolab.2019.103867
Andrei Barcaru

Abstract An important step in multivariate analysis is the dimensionality reduction, which allows for a better classification and easier visualization of the class structures in the data. Techniques like PCA, PLS-DA and LDA are most often used to explore the patterns in the data and to reduce the dimensions. Yet the data does not always reveal properly the structures wen these techniques are applied. To this end, a supervised projection pursuit (SuPP) is proposed in this article, based on Jensen-Shannon divergence. The combination of this metric with powerful Monte Carlo based optimization algorithm, yielded a versatile dimensionality reduction technique capable of working with highly dimensional data and missing observations. Combined with Naive Bayes (NB) classifier, SuPP proved to be a powerful preprocessing tool for classification. Namely, on the Iris data set, the prediction accuracy of SuPP-NB is significantly higher than the prediction accuracy of PCA-NB, (p-value ≤ 4.02E-05 in a 2D latent space, p-value ≤ 3.00E-03 in a 3D latent space) and significantly higher than the prediction accuracy of PLS-DA (p-value ≤ 1.17E-05 in a 2D latent space and p-value ≤ 3.08E-03 in a 3D latent space). The significantly higher accuracy for this particular data set is a strong evidence of a better class separation in the latent spaces obtained with SuPP.

中文翻译:

监督投影追踪——一种针对概率分类优化的降维技术

摘要 多元分析的一个重要步骤是降维,它允许更好地分类和更容易地可视化数据中的类结构。PCA、PLS-DA 和 LDA 等技术最常用于探索数据中的模式并减少维度。然而,数据并不总能正确揭示应用这些技术的结构。为此,本文基于 Jensen-Shannon 散度提出了一种监督投影追踪(SuPP)。该指标与强大的基于蒙特卡罗的优化算法相结合,产生了一种通用的降维技术,能够处理高维数据和缺失的观察。与朴素贝叶斯 (NB) 分类器相结合,SuPP 被证明是一种强大的分类预处理工具。即,在 Iris 数据集上,SuPP-NB 的预测精度明显高于 PCA-NB 的预测精度,(2D 潜在空间中 p-value ≤ 4.02E-05,a 中 p-value ≤ 3.00E-03 3D 潜在空间)并且显着高于 PLS-DA 的预测精度(2D 潜在空间中的 p 值≤ 1.17E-05,3D 潜在空间中 p 值≤ 3.08E-03)。这个特定数据集的显着更高的准确度是使用 SuPP 获得的潜在空间中更好的类分离的有力证据。
更新日期:2019-11-01
down
wechat
bug