当前位置: X-MOL 学术Lifetime Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients
Lifetime Data Analysis ( IF 1.2 ) Pub Date : 2022-01-29 , DOI: 10.1007/s10985-021-09544-2
Wenbo Wu 1 , Jeremy M G Taylor 1 , Andrew F Brouwer 2 , Lingfeng Luo 1 , Jian Kang 1 , Hui Jiang 1 , Kevin He 1
Affiliation  

Survival modeling with time-varying coefficients has proven useful in analyzing time-to-event data with one or more distinct failure types. When studying the cause-specific etiology of breast and prostate cancers using the large-scale data from the Surveillance, Epidemiology, and End Results (SEER) Program, we encountered two major challenges that existing methods for estimating time-varying coefficients cannot tackle. First, these methods, dependent on expanding the original data in a repeated measurement format, result in formidable time and memory consumption as the sample size escalates to over one million. In this case, even a well-configured workstation cannot accommodate their implementations. Second, when the large-scale data under analysis include binary predictors with near-zero variance (e.g., only 0.6% of patients in our SEER prostate cancer data had tumors regional to the lymph nodes), existing methods suffer from numerical instability due to ill-conditioned second-order information. The estimation accuracy deteriorates further with multiple competing risks. To address these issues, we propose a proximal Newton algorithm with a shared-memory parallelization scheme and tests of significance and nonproportionality for the time-varying effects. A simulation study shows that our scalable approach reduces the time and memory costs by orders of magnitude and enjoys improved estimation accuracy compared with alternative approaches. Applications to the SEER cancer data demonstrate the real-world performance of the proximal Newton algorithm.



中文翻译:

具有时变系数的特定原因危害建模的可扩展近端方法

事实证明,具有时变系数的生存模型可用于分析具有一种或多种不同故障类型的事件发生时间数据。在使用来自监测、流行病学和最终结果 (SEER) 计划的大规模数据研究乳腺癌和前列腺癌的病因特异性病因时,我们遇到了现有估计时变系数的方法无法解决的两个主要挑战。首先,这些方法依赖于以重复测量格式扩展原始数据,随着样本量升级到超过一百万,会导致大量时间和内存消耗。在这种情况下,即使是配置良​​好的工作站也无法适应它们的实现。其次,当分析的大规模数据包括方差接近零的二元预测变量(例如,只有 0. 在我们的 SEER 前列腺癌数据中,有 6% 的患者有淋巴结区域的肿瘤),现有方法由于二阶信息不良而存在数值不稳定性。随着多重竞争风险,估计准确性进一步恶化。为了解决这些问题,我们提出了一种具有共享内存并行化方案的近似牛顿算法,并测试了时变效应的显着性和非比例性。一项模拟研究表明,与替代方法相比,我们的可扩展方法将时间和内存成本降低了几个数量级,并且具有更高的估计精度。SEER 癌症数据的应用展示了近似牛顿算法的真实性能。由于病态的二阶信息,现有方法存在数值不稳定性。随着多重竞争风险,估计准确性进一步恶化。为了解决这些问题,我们提出了一种具有共享内存并行化方案的近似牛顿算法,并测试了时变效应的显着性和非比例性。一项模拟研究表明,与替代方法相比,我们的可扩展方法将时间和内存成本降低了几个数量级,并且具有更高的估计精度。SEER 癌症数据的应用展示了近似牛顿算法的真实性能。由于病态的二阶信息,现有方法存在数值不稳定性。随着多重竞争风险,估计准确性进一步恶化。为了解决这些问题,我们提出了一种具有共享内存并行化方案的近似牛顿算法,并测试了时变效应的显着性和非比例性。一项模拟研究表明,与替代方法相比,我们的可扩展方法将时间和内存成本降低了几个数量级,并且具有更高的估计精度。SEER 癌症数据的应用展示了近似牛顿算法的真实性能。我们提出了一种具有共享内存并行化方案的近似牛顿算法,并测试了时变效应的显着性和非比例性。一项模拟研究表明,与替代方法相比,我们的可扩展方法将时间和内存成本降低了几个数量级,并且具有更高的估计精度。SEER 癌症数据的应用展示了近似牛顿算法的真实性能。我们提出了一种具有共享内存并行化方案的近似牛顿算法,并测试了时变效应的显着性和非比例性。一项模拟研究表明,与替代方法相比,我们的可扩展方法将时间和内存成本降低了几个数量级,并且具有更高的估计精度。SEER 癌症数据的应用展示了近似牛顿算法的真实性能。

更新日期:2022-01-29
down
wechat
bug