当前位置: X-MOL 学术Biometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Case contamination in electronic health records-based case-control studies
Biometrics ( IF 1.9 ) Pub Date : 2020-05-08 , DOI: 10.1111/biom.13264
Lu Wang 1 , Jill Schnall 1 , Aeron Small 2 , Rebecca A Hubbard 1 , Jason H Moore 1, 3 , Scott M Damrauer 4 , Jinbo Chen 1
Affiliation  

Clinically relevant information from electronic health records (EHRs) permits derivation of a rich collection of phenotypes. Unlike traditionally designed studies where scientific hypotheses are specified a priori before data collection, the true phenotype status of any given individual in EHR-based studies is not directly available. Structured and unstructured data elements need to be queried through pre-constructed rules to identify case and control groups. A sufficient number of controls can usually be identified with high accuracy by making the selection criteria stringent. But more relaxed criteria are often necessary for more thorough identification of cases to ensure achievable statistical power. The resulting pool of candidate cases consists of genuine cases contaminated with non-case patients who do not satisfy the control definition. The presence of patients who are neither true cases nor controls among the identified cases is a unique challenge in EHR-based case-control studies. Ignoring case contamination would lead to biased estimation of odds ratio association parameters. We propose an estimating equation approach to bias correction, study its large sample property, and evaluate its performance through extensive simulation studies and an application to a pilot study of aortic stenosis in the Penn medicine EHR. Our method holds the promise of facilitating more efficient EHR studies by accommodating enlarged albeit contaminated case pools.

中文翻译:

基于电子健康记录的病例对照研究中的病例污染

来自电子健康记录 (EHR) 的临床相关信息允许推导丰富的表型集合。与在数据收集之前先验地指定科学假设的传统设计研究不同,在基于 EHR 的研究中,任何给定个体的真实表型状态都不能直接获得。结构化和非结构化数据元素需要通过预先构建的规则进行查询,以识别案例和控制组。通过使选择标准严格,通常可以高精度地识别足够数量的对照。但是,为了更彻底地识别案例以确保可实现的统计能力,通常需要更宽松的标准。由此产生的候选病例库由被不满足控制定义的非病例患者污染的真实病例组成。在基于 EHR 的病例对照研究中,既不是真实病例也不是对照组的患者的存在是一个独特的挑战。忽略病例污染会导致对优势比关联参数的估计有偏差。我们提出了一种估计方程方法来校正偏差,研究其大样本特性,并通过广泛的模拟研究评估其性能,并将其应用于宾夕法尼亚大学医学 EHR 中主动脉瓣狭窄的初步研究。我们的方法有望通过容纳扩大但受污染的病例库来促进更有效的 EHR 研究。忽略病例污染会导致对优势比关联参数的估计有偏差。我们提出了一种估计方程方法来校正偏差,研究其大样本特性,并通过广泛的模拟研究和应用到宾夕法尼亚医学 EHR 中主动脉瓣狭窄的初步研究来评估其性能。我们的方法有望通过容纳扩大但受污染的病例库来促进更有效的 EHR 研究。忽略病例污染会导致对优势比关联参数的估计有偏差。我们提出了一种估计方程方法来校正偏差,研究其大样本特性,并通过广泛的模拟研究和应用到宾夕法尼亚医学 EHR 中主动脉瓣狭窄的初步研究来评估其性能。我们的方法有望通过容纳扩大但受污染的病例库来促进更有效的 EHR 研究。
更新日期:2020-05-08
down
wechat
bug