当前位置: X-MOL 学术arXiv.cs.OH › 论文详情
Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications
arXiv - CS - Other Computer Science Pub Date : 2021-04-06 , DOI: arxiv-2104.02712
Eunice Jun, Melissa Birchfield, Nicole de Moura, Jeffrey Heer, Rene Just

Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analysis of research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analysis to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation, and discuss implications for future tools.

中文翻译:

假设形式化:经验发现,软件局限性和设计含义

数据分析需要将更高级别的问题和假设转换为可计算的统计模型。我们提出了一种混合方法研究,旨在确定将假设运用于统计模型中涉及的步骤,考虑因素和挑战,这一过程我们称为假设形式化。在研究论文的形成性内容分析中,我们发现研究人员强调了将假设分解为子假设,选择代理变量以及基于数据收集设计制定统计模型作为关键步骤。在实验室研究中,我们发现分析人员着眼于实施,并且使他们的分析适合于熟悉的方法,即使是次优的。在分析软件工具时,我们发现工具提供的不一致,低级抽象可能会限制分析人员用来对假设进行形式化的统计模型。基于这些观察,我们将假设形式化描述为一种双重搜索过程,平衡了受数据和计算约束的概念和统计因素,并讨论了对未来工具的启示。
更新日期:2021-04-08
全部期刊列表>>
2021中国学者有奖调研
JACS
材料科学跨学科高质量前沿研究
中国作者高影响力研究精选
虚拟特刊
屿渡论文,编辑服务
何川
清华大学
郭维
上海中医药大学
华东师范大学
北京大学许言
楚甲祥
西湖石航
上海交大
北理工
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
南开大学
张韶光
华辉
天合科研
x-mol收录
试剂库存
down
wechat
bug