当前位置: X-MOL 学术Stat. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using electronic health records to identify candidates for human immunodeficiency virus pre-exposure prophylaxis: An application of super learning to risk prediction when the outcome is rare.
Statistics in Medicine ( IF 2 ) Pub Date : 2020-06-24 , DOI: 10.1002/sim.8591
Susan Gruber 1 , Douglas Krakower 2, 3, 4, 5 , John T Menchaca 5 , Katherine Hsu 6, 7 , Rebecca Hawrusik 6 , Judith C Maro 5 , Noelle M Cocoros 5 , Benjamin A Kruskal 8 , Ira B Wilson 9 , Kenneth H Mayer 2, 3, 4 , Michael Klompas 5, 10
Affiliation  

Human immunodeficiency virus (HIV) pre‐exposure prophylaxis (PrEP) protects high risk patients from becoming infected with HIV. Clinicians need help to identify candidates for PrEP based on information routinely collected in electronic health records (EHRs). The greatest statistical challenge in developing a risk prediction model is that acquisition is extremely rare. Methods: Data consisted of 180 covariates (demographic, diagnoses, treatments, prescriptions) extracted from records on 399 385 patient (150 cases) seen at Atrius Health (2007‐2015), a clinical network in Massachusetts. Super learner is an ensemble machine learning algorithm that uses k‐fold cross validation to evaluate and combine predictions from a collection of algorithms. We trained 42 variants of sophisticated algorithms, using different sampling schemes that more evenly balanced the ratio of cases to controls. We compared super learner's cross validated area under the receiver operating curve (cv‐AUC) with that of each individual algorithm. Results: The least absolute shrinkage and selection operator (lasso) using a 1:20 class ratio outperformed the super learner (cv‐AUC = 0.86 vs 0.84). A traditional logistic regression model restricted to 23 clinician‐selected main terms was slightly inferior (cv‐AUC = 0.81). Conclusion: Machine learning was successful at developing a model to predict 1‐year risk of acquiring HIV based on a physician‐curated set of predictors extracted from EHRs.

中文翻译:

使用电子健康记录来识别人类免疫缺陷病毒暴露前预防的候选者:当结果很少时,将超级学习应用于风险预测。

人类免疫缺陷病毒(HIV)暴露前预防(PrEP)保护高危患者免于感染HIV。临床医生需要帮助,以根据电子健康记录(EHR)中常规收集的信息来确定PrEP的候选人。在开发风险预测模型中最大的统计挑战是收购极为罕见。方法:数据由180个协变量(人口统计学,诊断,治疗,处方)组成,这些协变量来自在马萨诸塞州的临床网络Atrius Health(2007-2015)上查看的399385名患者(150例)的记录。超级学习者是使用k的整体机器学习算法交叉验证,以评估和组合算法集合中的预测。我们使用不同的采样方案训练了42种复杂算法的变体,这些采样方案更加均衡地平衡了案例与控件的比率。我们将超级学习者在接收器工作曲线(cv-AUC)下的交叉验证面积与每个单独算法的交叉验证面积进行了比较。结果:使用1:20类别比率的最小绝对收缩和选择算子(套索)优于超级学习者(cv-AUC = 0.86 vs 0.84)。限于23个临床医生选择的主要术语的传统逻辑回归模型稍差(cv-AUC = 0.81)。结论:机器学习成功地开发了一种模型,该模型可根据从EHR中提取的一组医生策划的预测因子来预测1年内感染HIV的风险。
更新日期:2020-06-24
down
wechat
bug