A Deep Variational Approach to Clustering Survival Data,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Deep Variational Approach to Clustering Survival Data
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05763
Laura Manduchi, Ričards Marcinkevičs, Michela C. Massi, Verena Gotta, Timothy Müller, Flavio Vasella, Marian C. Neidert, Marc Pfister, Julia E. Vogt

Survival analysis has gained significant attention in the medical domain and has many far-reaching applications. Although a variety of machine learning methods have been introduced for tackling time-to-event prediction in unstructured data with complex dependencies, clustering of survival data remains an under-explored problem. The latter is particularly helpful in discovering patient subpopulations whose survival is regulated by different generative mechanisms, a critical problem in precision medicine. To this end, we introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting. Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times. We compare our model to the related work on survival clustering in comprehensive experiments on a range of synthetic, semi-synthetic, and real-world datasets. Our proposed method performs better at identifying clusters and is competitive at predicting survival times in terms of the concordance index and relative absolute error. To further demonstrate the usefulness of our approach, we show that our method identifies meaningful clusters from an observational cohort of hemodialysis patients that are consistent with previous clinical findings.

中文翻译：

聚类生存数据的深度变分方法

生存分析在医学领域获得了极大的关注，并具有许多深远的应用。尽管已经引入了各种机器学习方法来处理具有复杂依赖性的非结构化数据中的时间到事件预测，但生存数据的聚类仍然是一个未充分探索的问题。后者特别有助于发现其生存受不同生成机制调节的患者亚群，这是精准医学中的一个关键问题。为此，我们引入了一种新的概率方法来在变分深度聚类设置中对生存数据进行聚类。我们提出的方法采用深度生成模型来揭示解释变量和潜在审查生存时间的潜在分布。我们在一系列合成、半合成和真实世界数据集的综合实验中将我们的模型与生存聚类的相关工作进行了比较。我们提出的方法在识别集群方面表现更好，并且在根据一致性指数和相对绝对误差预测生存时间方面具有竞争力。为了进一步证明我们方法的有用性，我们表明我们的方法从血液透析患者的观察队列中识别出有意义的集群，这些集群与之前的临床发现一致。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文