Learning from vertically distributed data across multiple sites: An efficient privacy-preserving algorithm for Cox proportional hazards model with variable selection,Journal of Biomedical informatics

当前位置： X-MOL 学术 › J. Biomed. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning from vertically distributed data across multiple sites: An efficient privacy-preserving algorithm for Cox proportional hazards model with variable selection
Journal of Biomedical informatics ( IF 4.0 ) Pub Date : 2023-12-23 , DOI: 10.1016/j.jbi.2023.104581
Guanhong Miao ₁ , Lei Yu ₂ , Jingyun Yang ₂ , David A Bennett ₂ , Jinying Zhao ₃ , Samuel S Wu ₄

Affiliation

Objective

To develop a lossless distributed algorithm for regularized Cox proportional hazards model with variable selection to support federated learning for vertically distributed data.

Methods

We propose a novel distributed algorithm for fitting regularized Cox proportional hazards model when data sharing among different data providers is restricted. Based on cyclical coordinate descent, the proposed algorithm computes intermediary statistics by each site and then exchanges them to update the model parameters in other sites without accessing individual patient-level data. We evaluate the performance of the proposed algorithm with (1) a simulation study and (2) a real-world data analysis predicting the risk of Alzheimer’s dementia from the Religious Orders Study and Rush Memory and Aging Project (ROSMAP). Moreover, we compared the performance of our method with existing privacy-preserving models.

Results

Our algorithm achieves privacy-preserving variable selection for time-to-event data in the vertically distributed setting, without degradation of accuracy compared with a centralized approach. Simulation demonstrates that our algorithm is highly efficient in analyzing high-dimensional datasets. Real-world data analysis reveals that our distributed Cox model yields higher accuracy in predicting the risk of Alzheimer’s dementia than the conventional Cox model built by each data provider without data sharing. Moreover, our algorithm is computationally more efficient compared with existing privacy-preserving Cox models with or without regularization term.

Conclusion

The proposed algorithm is lossless, privacy-preserving and highly efficient to fit regularized Cox model for vertically distributed data. It provides a suitable and convenient approach for modeling time-to-event data in a distributed manner.

中文翻译：

从跨多个站点的垂直分布数据中学习：具有变量选择的 Cox 比例风险模型的有效隐私保护算法

客观的

为正则化 Cox 比例风险模型开发一种无损分布式算法，具有变量选择，以支持垂直分布数据的联邦学习。

方法

我们提出了一种新颖的分布式算法，用于在不同数据提供者之间的数据共享受到限制时拟合正则化 Cox 比例风险模型。基于循环坐标下降，所提出的算法计算每个站点的中间统计数据，然后交换它们以更新其他站点的模型参数，而无需访问单个患者级别的数据。我们通过（1）模拟研究和（2）现实世界数据分析来评估所提出算法的性能，该数据分析来自宗教秩序研究和 Rush Memory and Aging Project (ROSMAP)，预测阿尔茨海默氏痴呆症的风险。此外，我们将我们的方法的性能与现有的隐私保护模型进行了比较。

结果

我们的算法在垂直分布式设置中实现了事件时间数据的隐私保护变量选择，与集中式方法相比，准确性没有降低。仿真表明我们的算法在分析高维数据集方面非常有效。现实世界的数据分析表明，我们的分布式 Cox 模型在预测阿尔茨海默氏痴呆症风险方面比每个数据提供商在没有数据共享的情况下构建的传统 Cox 模型具有更高的准确性。此外，与现有的具有或不具有正则化项的隐私保护 Cox 模型相比，我们的算法在计算上更加高效。