当前位置: X-MOL 学术Comput. Graph. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ProSeCo: Visual analysis of class separation measures and dataset characteristics
Computers & Graphics ( IF 2.5 ) Pub Date : 2021-03-28 , DOI: 10.1016/j.cag.2021.03.004
Jürgen Bernard , Marco Hutter , Matthias Zeppelzauer , Michael Sedlmair , Tamara Munzner

Class separation is an important concept in machine learning and visual analytics. We address the visual analysis of class separation measures for both high-dimensional data and its corresponding projections into 2D through dimensionality reduction (DR) methods. Although a plethora of separation measures have been proposed, it is difficult to compare class separation between multiple datasets with different characteristics, multiple separation measures, and multiple DR methods. We present ProSeCo, an interactive visualization approach to support comparison between up to 20 class separation measures and up to 4 DR methods, with respect to any of 7 dataset characteristics: dataset size, dataset dimensions, class counts, class size variability, class size skewness, outlieriness, and real-world vs. synthetically generated data. ProSeCo supports (1) comparing across measures, (2) comparing high-dimensional to dimensionally-reduced 2D data across measures, (3) comparing between different DR methods across measures, (4) partitioning with respect to a dataset characteristic, (5) comparing partitions for a selected characteristic across measures, and (6) inspecting individual datasets in detail. We demonstrate the utility of ProSeCo in two usage scenarios, using datasets [1] posted at https://osf.io/epcf9/.



中文翻译:

ProSeCo:类分离度量和数据集特征的可视化分析

类分离是机器学习和视觉分析中的重要概念。我们通过维数缩减(DR)方法解决了对高维数据及其到2D的相应投影的类分离度量的可视化分析。尽管已经提出了许多分离措施,但是很难比较具有不同特征的多个数据集,多个分离措施和多个DR方法之间的类分离。我们提出了ProSeCo,这是一种交互式可视化方法,可针对7个数据集特征中的任何一个,支持多达20个类分离度量和多达4种DR方法之间的比较:数据集大小,数据集维度,类计数,类大小可变性,类大小偏斜度,离群值以及真实世界与综合生成的数据。ProSeCo支持(1)跨度量比较,(2)跨度量比较高维和降维的2D数据,(3)跨度量比较不同的DR方法,(4)相对于数据集特征进行分区,(5)比较各个度量中所选特征的分区,以及(6)详细检查各个数据集。我们使用在https://osf.io/epcf9/上发布的数据集[1]演示了ProSeCo在两种使用情况下的实用性。

更新日期:2021-04-14
down
wechat
bug