Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications,arXiv - CS - Other Computer Science

当前位置： X-MOL 学术 › arXiv.cs.OH › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications
arXiv - CS - Other Computer Science Pub Date : 2021-04-06 , DOI: arxiv-2104.02712
Eunice Jun, Melissa Birchfield, Nicole de Moura, Jeffrey Heer, Rene Just

Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analysis of research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analysis to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation, and discuss implications for future tools.

中文翻译：

假设形式化：经验发现，软件局限性和设计含义

数据分析需要将更高级别的问题和假设转换为可计算的统计模型。我们提出了一种混合方法研究，旨在确定将假设运用于统计模型中涉及的步骤，考虑因素和挑战，这一过程我们称为假设形式化。在研究论文的形成性内容分析中，我们发现研究人员强调了将假设分解为子假设，选择代理变量以及基于数据收集设计制定统计模型作为关键步骤。在实验室研究中，我们发现分析人员着眼于实施，并且使他们的分析适合于熟悉的方法，即使是次优的。在分析软件工具时，我们发现工具提供的不一致，低级抽象可能会限制分析人员用来对假设进行形式化的统计模型。基于这些观察，我们将假设形式化描述为一种双重搜索过程，平衡了受数据和计算约束的概念和统计因素，并讨论了对未来工具的启示。

更新日期：2021-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文