Inferactive data analysis,Scandinavian Journal of Statistics

当前位置： X-MOL 学术 › Scand. J. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Inferactive data analysis
Scandinavian Journal of Statistics ( IF 1 ) Pub Date : 2019-12-10 , DOI: 10.1111/sjos.12425
Nan Bi ₁ , Jelena Markovic ₁ , Lucy Xia ₁ , Jonathan Taylor ₁

Affiliation

We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory (roughly speaking "model free") and confirmatory data analysis (roughly speaking classical and "model based"), also allowing for Bayesian data analysis. We view this approach as close in spirit to current practice of applied statisticians and data scientists while allowing frequentist guarantees for results to be reported in the scientific literature, or Bayesian results where the data scientist may choose the statistical model (and hence the prior) after some initial exploratory analysis. While this approach to data analysis does not cover every scenario, and every possible algorithm data scientists may use, we see this as a useful step in concrete providing tools (with frequentist statistical guarantees) for current data scientists. The basis of inference we use is selective inference [Lee et al., 2016, Fithian et al., 2014], in particular its randomized form [Tian and Taylor, 2015a]. The randomized framework, besides providing additional power and shorter confidence intervals, also provides explicit forms for relevant reference distributions (up to normalization) through the {\em selective sampler} of Tian et al. [2016]. The reference distributions are constructed from a particular conditional distribution formed from what we call a DAG-DAG -- a Data Analysis Generative DAG. As sampling conditional distributions in DAGs is generally complex, the selective sampler is crucial to any practical implementation of inferactive data analysis. Our principal goal is in reviewing the recent developments in selective inference as well as describing the general philosophy of selective inference.

中文翻译：

推理数据分析

我们描述了推理数据分析，如此命名以表示数据分析的交互式方法，重点是数据分析后的推理。我们的方法是 Tukey 的探索性（粗略地说“无模型”）和验证性数据分析（粗略地说经典和“基于模型”）之间的折衷，也允许贝叶斯数据分析。我们认为这种方法在精神上与应用统计学家和数据科学家的当前实践很接近，同时允许频率论者保证在科学文献中报告结果，或贝叶斯结果，其中数据科学家可以选择统计模型（因此是先验）之后一些初步的探索性分析。虽然这种数据分析方法并未涵盖所有场景，以及数据科学家可能使用的每一种可能的算法，我们认为这是为当前的数据科学家提供具体工具（具有频率统计保证）的有用步骤。我们使用的推理基础是选择性推理 [Lee 等人，2016，Fithian 等人，2014]，特别是其随机形式 [Tian 和 Taylor，2015a]。随机化框架除了提供额外的功效和更短的置信区间外，还通过 Tian 等人的 {\em 选择性采样器} 为相关参考分布（直至归一化）提供了明确的形式。[2016]。参考分布由特定条件分布构建而成，该条件分布由我们称为 DAG-DAG（数据分析生成 DAG）形成。由于 DAG 中的采样条件分布通常很复杂，因此选择性采样器对于推理数据分析的任何实际实现都至关重要。

更新日期：2019-12-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>