Data visualization case studies for high‐dimensional data validation,Stat

当前位置： X-MOL 学术 › Stat › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data visualization case studies for high‐dimensional data validation
Stat ( IF 1.7 ) Pub Date : 2020-11-26 , DOI: 10.1002/sta4.334
Aaron R. Williams ₁

Affiliation

Microsimulation and synthetic data are often high dimensional, requiring extensive validation and exploration to compare results against certain benchmarks. In both cases, validation is necessary to ensure that the many univariate distributions and multivariate relationships in the new data are similar to the many univariate distributions and multivariate relationships in the underlying data. This article illustrates some data visualization techniques for comparing a generated sample or population against a known reference sample or population. For implementation ease, we also outline an iterative workflow built with R Markdown that can be shared publicly on GitHub or privately with Amazon Web Services S3. The lessons learned from this work apply to any analysis that compares multiple data sets, deals with high‐dimensional data, or involves summarizing iterations of analyses.

中文翻译：

用于高维数据验证的数据可视化案例研究

微观模拟和合成数据通常是高维的，需要进行大量验证和探索才能将结果与某些基准进行比较。在这两种情况下，都必须进行验证以确保新数据中的许多单变量分布和多元关系与基础数据中的许多单变量分布和多元关系相似。本文介绍了一些数据可视化技术，用于将生成的样本或总体与已知参考样本或总体进行比较。为了简化实施，我们还概述了使用R构建的迭代工作流Markdown可以在GitHub上公开共享，也可以与Amazon Web Services S3私有共享。从这项工作中吸取的教训适用于比较多个数据集，处理高维数据或涉及汇总分析迭代的任何分析。

更新日期：2020-11-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>