当前位置: X-MOL 学术Lifetime Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi-supervised approach to event time annotation using longitudinal electronic health records
Lifetime Data Analysis ( IF 1.3 ) Pub Date : 2022-06-26 , DOI: 10.1007/s10985-022-09557-5
Liang Liang 1 , Jue Hou 1 , Hajime Uno 2 , Kelly Cho 3, 4 , Yanyuan Ma 5 , Tianxi Cai 1, 6
Affiliation  

Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.



中文翻译:

使用纵向电子健康记录的事件时间注释的半监督方法

来自保险索赔和电子健康记录 (EHR) 系统的大型临床数据集是精准医学研究的宝贵来源。这些数据集可用于开发风险或治疗反应的个性化预测模型。然而,使用真实世界的数据有效地推导预测模型面临着实践和方法上的挑战。这些数据库中不容易获得关于重要临床结果(例如癌症进展时间)的准确信息。真实的临床事件时间通常不能基于简单的计费或程序代码摘录很好地估算。然而,手动注释事件时间会耗费大量时间和资源。在本文中,我们提出了一种利用多维纵向 EHR 遭遇记录的两步半监督多模式自动时间注释 (MATA) 方法。在步骤 I 中,我们采用功能主成分分析方法,根据未标记患者的观察点过程来估计潜在的强度函数。在步骤 II 中,我们在标记数据中使用步骤 I 中导出的特征将惩罚比例赔率模型拟合到事件时间结果,其中使用 B 样条近似非参数基线函数。在规律性条件下,特征效应向量的估计结果显示为 root- 我们将一个惩罚比例赔率模型拟合到事件时间结果,该模型具有在步骤 I 中导出的特征在标记数据中,其中使用 B 样条近似非参数基线函数。在规律性条件下,特征效应向量的估计结果显示为 root- 我们将一个惩罚比例赔率模型拟合到事件时间结果,该模型具有在步骤 I 中导出的特征在标记数据中,其中使用 B 样条近似非参数基线函数。在规律性条件下,特征效应向量的估计结果显示为 root-n一致。我们通过模拟和真实数据示例证明了我们的方法相对于现有方法的优越性,该示例在来自退伍军人健康管理局的肺癌患者 EHR 队列中注释肺癌复发。

更新日期:2022-06-27
down
wechat
bug