当前位置: X-MOL 学术PLOS ONE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting rare diseases in electronic health records using machine learning and knowledge engineering: Case study of acute hepatic porphyria.
PLOS ONE ( IF 2.9 ) Pub Date : 2020-07-02 , DOI: 10.1371/journal.pone.0235574
Aaron M Cohen 1 , Steven Chamberlin 1 , Thomas Deloughery 1 , Michelle Nguyen 1 , Steven Bedrick 1 , Stephen Meninger 2 , John J Ko 2 , Jigar J Amin 2 , Alex J Wei 2 , William Hersh 1
Affiliation  

Background

With the growing adoption of the electronic health record (EHR) worldwide over the last decade, new opportunities exist for leveraging EHR data for detection of rare diseases. Rare diseases are often not diagnosed or delayed in diagnosis by clinicians who encounter them infrequently. One such rare disease that may be amenable to EHR-based detection is acute hepatic porphyria (AHP). AHP consists of a family of rare, metabolic diseases characterized by potentially life-threatening acute attacks and chronic debilitating symptoms. The goal of this study was to apply machine learning and knowledge engineering to a large extract of EHR data to determine whether they could be effective in identifying patients not previously tested for AHP who should receive a proper diagnostic workup for AHP.

Methods and findings

We used an extract of the complete EHR data of 200,000 patients from an academic medical center and enriched it with records from an additional 5,571 patients containing any mention of porphyria in the record. After manually reviewing the records of all 47 unique patients with the ICD-10-CM code E80.21 (Acute intermittent [hepatic] porphyria), we identified 30 patients who were positive cases for our machine learning models, with the rest of the patients used as negative cases. We parsed the record into features, which were scored by frequency of appearance and filtered using univariate feature analysis. We manually choose features not directly tied to provider attributes or suspicion of the patient having AHP. We trained on the full dataset, with the best cross-validation performance coming from support vector machine (SVM) algorithm using a radial basis function (RBF) kernel. The trained model was applied back to the full data set and patients were ranked by margin distance. The top 100 ranked negative cases were manually reviewed for symptom complexes similar to AHP, finding four patients where AHP diagnostic testing was likely indicated and 18 patients where AHP diagnostic testing was possibly indicated. From the top 100 ranked cases of patients with mention of porphyria in their record, we identified four patients for whom AHP diagnostic testing was possibly indicated and had not been previously performed. Based solely on the reported prevalence of AHP, we would have expected only 0.002 cases out of the 200 patients manually reviewed.

Conclusions

The application of machine learning and knowledge engineering to EHR data may facilitate the diagnosis of rare diseases such as AHP. Further work will recommend clinical investigation to identified patients’ clinicians, evaluate more patients, assess additional feature selection and machine learning algorithms, and apply this methodology to other rare diseases. This work provides strong evidence that population-level informatics can be applied to rare diseases, greatly improving our ability to identify undiagnosed patients, and in the future improve the care of these patients and our ability study these diseases. The next step is to learn how best to apply these EHR-based machine learning approaches to benefit individual patients with a clinical study that provides diagnostic testing and clinical follow up for those identified as possibly having undiagnosed AHP.



中文翻译:


使用机器学习和知识工程检测电子健康记录中的罕见疾病:急性肝卟啉症案例研究。


 背景


过去十年,随着电子健康记录 (EHR) 在全球范围内的日益普及,利用 EHR 数据检测罕见疾病存在新的机会。罕见病往往不会被不常遇到的临床医生诊断出来或延迟诊断。一种可能适合基于 EHR 检测的罕见疾病是急性肝卟啉症 (AHP)。 AHP 包括一系列罕见的代谢性疾病,其特征是可能危及生命的急性发作和慢性衰弱症状。本研究的目标是将机器学习和知识工程应用于大量 EHR 数据提取,以确定它们是否可以有效地识别以前未进行 AHP 测试的患者,这些患者应该接受适当的 AHP 诊断检查。

 方法和结果


我们使用了来自学术医疗中心的 200,000 名患者的完整 EHR 数据的摘录,并用另外 5,571 名患者的记录对其进行了丰富,其中记录中包含任何提及卟啉症的内容。在手动审查了所有 47 名 ICD-10-CM 代码 E80.21(急性间歇性[肝]卟啉症)患者的记录后,我们确定了 30 名患者为我们的机器学习模型的阳性病例,其余患者为用作反面案例。我们将记录解析为特征,根据出现频率对这些特征进行评分,并使用单变量特征分析进行过滤。我们手动选择与提供者属性或患者患有 AHP 的怀疑不直接相关的特征。我们在完整数据集上进行训练,最佳交叉验证性能来自使用径向基函数 (RBF) 内核的支持向量机 (SVM) 算法。将训练好的模型应用回完整数据集,并按边缘距离对患者进行排名。对排名前 100 名的阴性病例进行了类似于 AHP 的症状综合检查,发现 4 名患者可能需要 AHP 诊断测试,18 名患者可能需要 AHP 诊断测试。从记录中提及卟啉症的前 100 名患者病例中,我们确定了 4 名可能需要进行 AHP 诊断测试但之前未进行过的患者。仅根据报告的 AHP 患病率,我们预计手动审查的 200 名患者中只有 0.002 例。

 结论


将机器学习和知识工程应用于 EHR 数据可能有助于 AHP 等罕见疾病的诊断。进一步的工作将向已确定患者的临床医生推荐临床调查,评估更多患者,评估其他特征选择和机器学习算法,并将这种方法应用于其他罕见疾病。这项工作提供了强有力的证据,表明人群水平的信息学可以应用于罕见疾病,大大提高我们识别未确诊患者的能力,并在未来改善这些患者的护理和我们研究这些疾病的能力。下一步是了解如何最好地应用这些基于 EHR 的机器学习方法,通过临床研究使个体患者受益,该临床研究为那些被确定为可能患有未诊断 AHP 的患者提供诊断测试和临床随访。

更新日期:2020-07-03
down
wechat
bug