当前位置: X-MOL 学术J. Stat. Comput. Simul. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A modified information criterion for tuning parameter selection in 1d fused LASSO for inference on multiple change points
Journal of Statistical Computation and Simulation ( IF 1.1 ) Pub Date : 2020-03-03 , DOI: 10.1080/00949655.2020.1732379
J. Lee 1 , J. Chen 1
Affiliation  

ABSTRACT Inference about multiple change points has been an interesting topic in the statistics literature. Recently, the high throughput technologies became the most popularly used tools in genomic studies and yielded massive data. In particular, when the concern is searching for heterogenous segments in a massive data set, it becomes an interesting problem in statistical change point analysis. That is, one tries to determine if there are multiple change points that separate the data into different parts. Such data have a ‘sparsity’ feature (within each part, the data points are homogenous), and hence penalized regression, such as the 1d fused LASSO, has been recently used for detecting multiple change points in high throughput data. One of the main challenges for detecting change points is to estimate the number of change points which then becomes the problem of how to select an optimal tuning parameter in the LASSO methods for change point problems. Therefore, in this work, we propose to use a modified Bayesian information criterion to estimate the optimal tuning parameter in the 1d fused LASSO for multiple change points detection. We show theoretically that the proposed JMIC consistently identifies the true number of change points via providing the optimal tuning parameter for 1d fused LASSO. Simulation studies and application to a next-generation sequencing data of a breast cancer tumour cell line illustrated the usefulness of the proposed method.

中文翻译:

一维融合LASSO中调整参数选择的改进信息标准,用于对多个变化点的推理

摘要 在统计文献中,关于多个变化点的推断一直是一个有趣的话题。最近,高通量技术成为基因组研究中最常用的工具,并产生了大量数据。特别是,当关注点是在海量数据集中搜索异构段时,这成为统计变化点分析中的一个有趣问题。也就是说,人们试图确定是否有多个变化点将数据分成不同的部分。此类数据具有“稀疏性”特征(在每个部分内,数据点是同质的),因此惩罚回归(例如 1d 融合 LASSO)最近已用于检测高吞吐量数据中的多个变化点。检测变化点的主要挑战之一是估计变化点的数量,然后成为如何在 LASSO 方法中为变化点问题选择最佳调整参数的问题。因此,在这项工作中,我们建议使用改进的贝叶斯信息准则来估计 1d 融合 LASSO 中的最佳调谐参数,以进行多变化点检测。我们从理论上表明,通过为 1d 融合 LASSO 提供最佳调整参数,所提出的 JMIC 始终如一地识别变化点的真实数量。对乳腺癌肿瘤细胞系下一代测序数据的模拟研究和应用说明了所提出方法的有用性。在这项工作中,我们建议使用改进的贝叶斯信息准则来估计 1d 融合 LASSO 中的最佳调谐参数,以进行多变化点检测。我们从理论上表明,通过为 1d 融合 LASSO 提供最佳调整参数,所提出的 JMIC 始终如一地识别变化点的真实数量。对乳腺癌肿瘤细胞系下一代测序数据的模拟研究和应用说明了所提出方法的有用性。在这项工作中,我们建议使用改进的贝叶斯信息准则来估计 1d 融合 LASSO 中的最佳调谐参数,以进行多变化点检测。我们从理论上表明,通过为 1d 融合 LASSO 提供最佳调整参数,所提出的 JMIC 始终如一地识别变化点的真实数量。对乳腺癌肿瘤细胞系下一代测序数据的模拟研究和应用说明了所提出方法的有用性。
更新日期:2020-03-03
down
wechat
bug