当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Context-Based Evaluation of Dimensionality Reduction Algorithms—Experiments and Statistical Significance Analysis
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2021-01-04 , DOI: 10.1145/3428077
Aindrila Ghosh 1 , Mona Nashaat 1 , James Miller 1 , Shaikh Quader 2
Affiliation  

Dimensionality reduction is a commonly used technique in data analytics. Reducing the dimensionality of datasets helps not only with managing their analytical complexity but also with removing redundancy. Over the years, several such algorithms have been proposed with their aims ranging from generating simple linear projections to complex non-linear transformations of the input data. Subsequently, researchers have defined several quality metrics in order to evaluate the performances of different algorithms. Hence, given a plethora of dimensionality reduction algorithms and metrics for their quality analysis, there is a long-existing need for guidelines on how to select the most appropriate algorithm in a given scenario. In order to bridge this gap, in this article, we have compiled 12 state-of-the-art quality metrics and categorized them into 5 identified analytical contexts. Furthermore, we assessed 15 most popular dimensionality reduction algorithms on the chosen quality metrics using a large-scale and systematic experimental study. Later, using a set of robust non-parametric statistical tests, we assessed the generalizability of our evaluation on 40 real-world datasets. Finally, based on our results, we present practitioners’ guidelines for the selection of an appropriate dimensionally reduction algorithm in the present analytical contexts.

中文翻译:

降维算法的基于上下文的评估——实验和统计显着性分析

降维是数据分析中常用的技术。降低数据集的维度不仅有助于管理其分析复杂性,还有助于消除冗余。多年来,已经提出了几种此类算法,其目标范围从生成简单的线性投影到输入数据的复杂非线性变换。随后,研究人员定义了几个质量指标,以评估不同算法的性能。因此,考虑到用于质量分析的降维算法和指标过多,长期以来一直需要指导如何在给定场景中选择最合适的算法。为了弥合这一差距,在本文中,我们编制了 12 个最先进的质量指标,并将它们分类为 5 个已确定的分析环境。此外,我们使用大规模和系统的实验研究评估了 15 种最流行的降维算法,以选择质量指标。后来,使用一组稳健的非参数统计测试,我们评估了我们对 40 个真实世界数据集的评估的普遍性。最后,根据我们的结果,我们提出了从业者在当前分析环境中选择合适的降维算法的指南。我们评估了我们对 40 个真实世界数据集的评估的普遍性。最后,根据我们的结果,我们提出了从业者在当前分析环境中选择合适的降维算法的指南。我们评估了我们对 40 个真实世界数据集的评估的普遍性。最后,根据我们的结果,我们提出了从业者在当前分析环境中选择合适的降维算法的指南。
更新日期:2021-01-04
down
wechat
bug