Sample-weighted semiparametric estimation of cause-specific cumulative risk and incidence using left- or interval-censored data from electronic health records.,Statistics in Medicine

当前位置： X-MOL 学术 › Stat. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sample-weighted semiparametric estimation of cause-specific cumulative risk and incidence using left- or interval-censored data from electronic health records.
Statistics in Medicine ( IF 1.8 ) Pub Date : 2020-05-10 , DOI: 10.1002/sim.8544
Noorie Hyun ₁ , Hormuzd A Katki ₂ , Barry I Graubard ₂

Affiliation

Electronic health records (EHRs) can be a cost‐effective data source for forming cohorts and developing risk models in the context of disease screening. However, important issues need to be handled: competing outcomes, left‐censoring of prevalent disease, interval‐censoring of incident disease, and uncertainty of prevalent disease when accurate disease ascertainment is not conducted at baseline. Furthermore, novel tests that are costly and limited in availability can be conducted on stored biospecimens selected as samples from EHRs by using different sampling fractions. We extend sample‐weighted semiparametric marginal mixture models to estimating competing risks. For flexible modeling of relative risks, a general transformation of the subdistribution hazard function and regression parameters is used. We propose a numerical algorithm for nonparametrically calculating the maximum likelihood estimates for subdistribution hazard functions and regression parameters. Methods for calculating the consistent confidence intervals for relative and absolute risk estimates are presented. The proposed algorithm and methods show reliable finite sample performance through simulation studies. We apply our methods to a cohort assembled from EHRs at a health maintenance organization where we estimate cumulative risk of cervical precancer/cancer and incidence of infection‐clearance by HPV genotype among human papillomavirus (HPV) positive women. There is no significant difference in 3‐year HPV‐clearance rates across different HPV types, but 3‐year cumulative risk of progression‐to‐precancer/cancer from HPV‐16 is relatively higher than the other HPV genotypes.

中文翻译：

使用电子健康记录的左或间隔检查数据，对特定原因的累积风险和发生率进行样本加权半参数估计。

电子病历（EHR）可以是一种经济高效的数据来源，可用于在疾病筛查的背景下形成队列并开发风险模型。但是，需要处理重要的问题：竞争结果，流行病的左检查，事件疾病的间隔检查以及未在基线进行准确的疾病确定的流行病的不确定性。此外，可以通过使用不同的采样分数，对从EHR中选择作为样本的存储生物样本进行昂贵且可用性有限的新颖测试。我们扩展了样本加权半参数边际混合模型以估计竞争风险。为了灵活地建模相对风险，使用了子分布危害函数和回归参数的一般转换。我们提出了一种数值算法，用于非参数地计算子分布危害函数和回归参数的最大似然估计。提出了计算相对和绝对风险估计的一致置信区间的方法。通过仿真研究，所提出的算法和方法显示出可靠的有限样本性能。我们将我们的方法应用于由一家健康维护组织的EHR汇总的队列中，在该队列中，我们评估了人类乳头瘤病毒（HPV）阳性女性中宫颈癌前期/癌症的累积风险和HPV基因型感染清除率。不同类型的HPV的3年HPV清除率没有显着差异，但是从HPV-16演变为癌前/癌症的3年累积风险相对高于其他HPV基因型。

更新日期：2020-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11