Valid post-selection inference in model-free linear regression,Annals of Statistics

当前位置： X-MOL 学术 › Ann. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Valid post-selection inference in model-free linear regression
Annals of Statistics ( IF 3.2 ) Pub Date : 2020-10-01 , DOI: 10.1214/19-aos1917
Arun K. Kuchibhotla , Lawrence D. Brown , Andreas Buja , Junhui Cai , Edward I. George , Linda H. Zhao

S.1. Simulations Continued. The simulation setting in this section is the same as in Section 9. We first describe the reason for using the null situation β0 0p in the model. If β0 is an arbitrary non-zero vector, then, for fixed covariates, XiYi cannot be identically distributed and hence only (asymptotically) conservative inference is possible. In simulations this conservativeness confounds with the simultaneity so that the coverage becomes close to 1 (if not 1). In the main manuscript, we have shown plots comparing our method with Berk et al. (2013) and selective inference. We label our confidence region R̂:n,M (12) as “UPoSI,” the projected confidence region B̂ n,M (28) as “UPoSIBox”, and Berk et al. (2013) as “PoSI.” Tables 1, 2, and 3 show exact numbers for the comparison of our method with Berk et al. (2013). Note that size of each dot in the row plot of Figure 9 indicates the proportion of confidence regions of that volume among same-sized models. In Setting A and B, the confidence region volumes of same-sized models are the same. In Setting C, volumes of confidence regions of Berk and PoSI Box enlarge (hence smaller logpVolq{|M |q if the last covariate is included. Tables 4 and 5 show the numbers for the comparison of our method with selective inference when the selection procedure is forward stepwise and LARS, respectively. Sample splitting is a simple procedure that provides valid inference after selection as discussed in Section 1.3. We stress here that this is valid only for independent observations and that the model selected in the first split half could be different from the one selected in the full data. The comparison results with n 1000, p 500 and selection methods forward stepwise, LARS and BIC are summarized in Figure S.1. For sample splitting we have used the Bonferroni correction to obtain simultaneous inference for all coefficients in a model. Table 6 shows the comparison of our method with sample splitting.

中文翻译：

无模型线性回归中的有效选择后推理

S.1. 模拟继续。本节中的仿真设置与第 9 节中的相同。我们首先描述在模型中使用空情况 β0 0p 的原因。如果 β0 是一个任意的非零向量，那么对于固定协变量，XiYi 不可能是同分布的，因此只有（渐近）保守推断是可能的。在模拟中，这种保守性与同时性混淆，因此覆盖率接近 1（如果不是 1）。在主要手稿中，我们展示了将我们的方法与 Berk 等人进行比较的图。(2013) 和选择性推理。我们将置信区域 R̂:n,M (12) 标记为“UPoSI”，将预测置信区域 B̂ n,M (28) 标记为“UPoSIBox”，而 Berk 等人。(2013) 作为“PoSI”。表 1、2 和 3 显示了我们的方法与 Berk 等人进行比较的确切数字。(2013)。请注意，图 9 行图中每个点的大小表示该体积的置信区域在相同大小的模型中的比例。在设置 A 和 B 中，相同大小模型的置信区域体积相同。在设置 C 中，Berk 和 PoSI Box 的置信区域体积扩大（因此，如果包含最后一个协变量，则 logpVolq{|M |q 更小。表 4 和 5 显示了我们的方法与选择性推理的比较数字分别是向前逐步和 LARS。样本分裂是一个简单的过程，它在选择后提供有效的推理，如第 1.3 节所述。我们在这里强调，这仅对独立观察有效，并且在第一个分裂一半中选择的模型可能不同从全数据中选取的一个，与n 1000的比较结果，图 S.1 总结了 p 500 和逐步选择方法、LARS 和 BIC。对于样本拆分，我们使用 Bonferroni 校正来获得模型中所有系数的同时推断。表 6 显示了我们的方法与样本拆分的比较。

更新日期：2020-10-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文