Scalable Algorithms for Large Competing Risks Data,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable Algorithms for Large Competing Risks Data
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2020-12-11 , DOI: 10.1080/10618600.2020.1841650
Eric S Kawaguchi ₁ , Jenny I Shen ₂ , Marc A Suchard _{3,

4,

5} , Gang Li _{3,

4}

Affiliation

Abstract

This article develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate $l_{0}$ -based iteratively reweighted $l_{2}$ -penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic updates of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted $l_{2}$ -penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the cost of computing the log-pseudo likelihood and its derivatives grows at the rate of $O (n^{2})$ with the sample size n in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to O(n). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yield > 1000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and data from the United States Renal Data System. Supplementary materials for this article are available online.

中文翻译：

大型竞争风险数据的可扩展算法

摘要

本文针对竞争风险时间事件数据开发了两个正交的可扩展稀疏回归贡献。首先，我们研究并加速了破碎自适应岭方法（BAR），一种替代方法 $l_{0}$ -基于迭代重新加权 $l_{2}$ - 在 Fine-Gray (1999) 比例子分布风险 (PSH) 模型的背景下，在其极限内实现稀疏性的惩罚算法。特别是，我们推导出了一种新的 BAR 回归算法，名为cyc BAR，它使用显式阈值公式对每个坐标执行循环更新。新的cyc BAR 算法有效避免拟合多重重加权 $l_{2}$ -惩罚，从而产生比原始 BAR 算法令人印象深刻的加速。其次，我们解决了与拟合 PSH 模型相关的关键计算问题。具体来说，计算对数伪似然及其导数的成本以 $○ (n^{2})$ 当前实现中的样本大小为n。我们提出了一种新颖的前向后向扫描算法，可将计算成本降低到O ( n )。所提出的方法适用于 PSH 模型的未惩罚和惩罚估计，并且在当前实现中表现出显着的加速。最后，结合这两种算法可以产生超过原始 BAR 算法 1000 倍的加速。使用美国肾脏数据系统的模拟和数据给出了我们提出的用于大型竞争风险数据的算法的令人印象深刻的可扩展性的说明。本文的补充材料可在线获取。

更新日期：2020-12-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>