High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).,Nature Protocols

当前位置： X-MOL 学术 › Nat. Protoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).
Nature Protocols ( IF 14.8 ) Pub Date : 2019-11-20 , DOI: 10.1038/s41596-019-0227-6
Yichi Zhang ₁ , Tianrun Cai ₂ , Sheng Yu _{3,

4} , Kelly Cho _{5,

6} , Chuan Hong ₁ , Jiehuan Sun ₁ , Jie Huang ₂ , Yuk-Lam Ho ₅ , Ashwin N Ananthakrishnan ₇ , Zongqi Xia ₈ , Stanley Y Shaw ₉ , Vivian Gainer ₁₀ , Victor Castro ₁₀ , Nicholas Link ₅ , Jacqueline Honerlaw ₅ , Sicong Huang ₂ , David Gagnon _{5,

11} , Elizabeth W Karlson ₂ , Robert M Plenge ₂ , Peter Szolovits ₁₂ , Guergana Savova ₁₃ , Susanne Churchill ₁₄ , Christopher O'Donnell _{5,

15} , Shawn N Murphy _{10,

14,

16} , J Michael Gaziano _{5,

6} , Isaac Kohane ₁₄ , Tianxi Cai _{1,

14} , Katherine P Liao _{2,

5,

14}

Affiliation

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.
Center for Statistical Science, Tsinghua University, Beijing, China.
Department of Industrial Engineering, Tsinghua University, Beijing, China.
Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA.
Division of Aging, Brigham and Women's Hospital, Boston, MA, USA.
Department of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA.
Department of Neurology, University of Pittsburgh, Pittsburgh, PA, USA.
Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA, USA.
Research Information Science and Computing, Partners Healthcare, Boston, MA, USA.
Department of Biostatistics, Boston University, Boston, MA, USA.
Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA.
Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Division of Cardiology, VA Boston Healthcare System, Boston, MA, USA.
Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).

中文翻译：

使用常见的半监督方法 (PheCAP) 对电子病历数据进行高通量表型分析。

表型是疾病风险和结果的临床和遗传学研究的基础。与电子病历 (EMR) 数据相关的生物库的增长促进并增加了对数百万患者表型分析的高效、准确和稳健方法的需求。使用 EMR 数据进行表型分析的挑战包括代码准确性的变化，以及识别算法特征和获得金标准标签所需的高水平手动输入。为了应对这些挑战，我们开发了 PheCAP，一种高通量半监督表型分析管道。PheCAP 从 EMR 中的数据开始，包括结构化数据和使用自然语言处理 (NLP) 从叙述笔记中提取的信息。标准化的步骤集成了自动化程序，从而降低了手动输入的水平，和用于算法训练的机器学习方法。如果所有数据都可用，PheCAP 本身可以在 1-2 d 内执行；但是，时间主要取决于图表审查阶段，通常需要至少 2 周。PheCAP 的最终产品包括表型算法、所有患者的表型概率和表型分类（是或否）。

更新日期：2019-11-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>