当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lux: Always-on Visualization Recommendations for Exploratory Data Science
arXiv - CS - Databases Pub Date : 2021-04-30 , DOI: arxiv-2105.00121
Doris Jung-Lin Lee, Dixin Tang, Kunal Agarwal, Thyne Boonmark, Caitlyn Chen, Jake Kang, Ujjaini Mukhopadhyay, Jerry Song, Micah Yong, Marti A. Hearst, Aditya G. Parameswaran

Exploratory data science largely happens in computational notebooks with dataframe API, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for accelerating visual insight discovery in data science workflows. When users print a dataframe in their notebooks, Lux recommends visualizations to provide a quick overview of the patterns and trends and suggests promising analysis directions. Lux features a high-level language for generating visualizations on-demand to encourage rapid visual experimentation with data. We demonstrate that through the use of a careful design and three system optimizations, Lux adds no more than two seconds of overhead on top of pandas for over 98% of datasets in the UCI repository. We evaluate Lux in terms of usability via a controlled first-use study and interviews with early adopters, finding that Lux helps fulfill the needs of data scientists for visualization support within their dataframe workflows. Lux has already been embraced by data science practitioners, with over 1.9k stars on Github within its first 15 months.

中文翻译:

Lux:探索性数据科学的永远在线的可视化建议

探索性数据科学主要发生在具有数据框API(例如熊猫)的计算笔记本中,该笔记本支持灵活的方式来转换,清理和分析数据。但是,在视觉上探索数据帧中的数据仍然很繁琐,需要大量的编程工作才能实现可视化,并且需要花心思才能确定接下来要执行的分析。我们建议使用Lux,这是一个始终在线的框架,可用于加速数据科学工作流程中的可视化见解发现。当用户在笔记本电脑上打印数据框时,Lux建议使用可视化工具以快速概述模式和趋势,并提出有希望的分析方向。Lux具有高级语言,可按需生成可视化,以鼓励对数据进行快速的视觉实验。我们证明了通过精心设计和三项系统优化,对于UCI存储库中超过98%的数据集,Lux在熊猫之上增加的开销不超过2秒。我们通过受控的首次使用研究和对早期采用者的采访,对Lux的可用性进行了评估,发现Lux可以帮助满足数据科学家在其数据框架工作流中提供可视化支持的需求。Lux已经被数据科学从业者所接受,在最初的15个月中,Github上有超过1.9k颗星。
更新日期:2021-05-04
down
wechat
bug