Visualization model validation via inline replication,Information Visualization

当前位置： X-MOL 学术 › Inf. Visualization › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visualization model validation via inline replication
Information Visualization ( IF 1.8 ) Pub Date : 2019-01-25 , DOI: 10.1177/1473871618821747
David Gotz ₁ , Wenyuan Wang ₁ , Annie T Chen ₂ , David Borland ₃

Affiliation

Data visualizations typically show a representation of a data set with little to no focus on the repeatability or generalizability of the displayed trends and patterns. However, insights gleaned from these visualizations are often used as the basis for decisions about future events. Visualizations of retrospective data therefore often serve as “visual predictive models.” However, this visual predictive model approach can lead to invalid inferences. In this article, we describe an approach to visual model validation called Inline Replication. Inline Replication is closely related to the statistical techniques of bootstrap sampling and cross-validation and, like those methods, provides a non-parametric and broadly applicable technique for assessing the variance of findings from visualizations. This article describes the overall Inline Replication process and outlines how it can be integrated into both traditional and emerging “big data” visualization pipelines. It also provides examples of how Inline Replication can be integrated into common visualization techniques such as bar charts and linear regression lines. Results from an empirical evaluation of the technique and two prototype Inline Replication–based visual analysis systems are also described. The empirical evaluation demonstrates the impact of Inline Replication under different conditions, showing that both (1) the level of partitioning and (2) the approach to aggregation have a major influence over its behavior. The results highlight the trade-offs in choosing Inline Replication parameters but suggest that using n = 5 partitions is a reasonable default.

中文翻译：

通过内联复制进行可视化模型验证

数据可视化通常显示数据集的表示，很少或不关注所显示趋势和模式的可重复性或普遍性。然而，从这些可视化中收集到的见解通常被用作未来事件决策的基础。因此，回顾性数据的可视化通常用作“视觉预测模型”。然而，这种视觉预测模型方法可能会导致无效的推论。在本文中，我们描述了一种称为内联复制的可视化模型验证方法。内联复制与引导抽样和交叉验证的统计技术密切相关，并且像这些方法一样，提供了一种非参数且广泛适用的技术来评估可视化结果的方差。本文描述了整个内联复制过程，并概述了如何将其集成到传统和新兴的“大数据”可视化管道中。它还提供了如何将内联复制集成到常见的可视化技术（如条形图和线性回归线）中的示例。还描述了对该技术的经验评估和两个基于内联复制的原型视觉分析系统的结果。实证评估证明了内联复制在不同条件下的影响，表明 (1) 分区级别和 (2) 聚合方法对其行为有重大影响。结果突出了选择内联复制参数的权衡，但建议使用 n = 5 分区是合理的默认值。

更新日期：2019-01-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11