当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated grouping of medical codes via multiview banded spectral clustering.
Journal of Biomedical informatics ( IF 4.0 ) Pub Date : 2019-10-28 , DOI: 10.1016/j.jbi.2019.103322
Luwan Zhang 1 , Yichi Zhang 2 , Tianrun Cai 3 , Yuri Ahuja 1 , Zeling He 3 , Yuk-Lam Ho 4 , Andrew Beam 5 , Kelly Cho 6 , Robert Carroll 7 , Joshua Denny 7 , Isaac Kohane 5 , Katherine Liao 8 , Tianxi Cai 9
Affiliation  

OBJECTIVE With its increasingly widespread adoption, electronic health records (EHR) have enabled phenotypic information extraction at an unprecedented granularity and scale. However, often a medical concept (e.g. diagnosis, prescription, symptom) is described in various synonyms across different EHR systems, hindering data integration for signal enhancement and complicating dimensionality reduction for knowledge discovery. Despite existing ontologies and hierarchies, tremendous human effort is needed for curation and maintenance - a process that is both unscalable and susceptible to subjective biases. This paper aims to develop a data-driven approach to automate grouping medical terms into clinically relevant concepts by combining multiple up-to-date data sources in an unbiased manner. METHODS We present a novel data-driven grouping approach - multi-view banded spectral clustering (mvBSC) combining summary data from multiple healthcare systems. The proposed method consists of a banding step that leverages the prior knowledge from the existing coding hierarchy, and a combining step that performs spectral clustering on an optimally weighted matrix. RESULTS We apply the proposed method to group ICD-9 and ICD-10-CM codes together by integrating data from two healthcare systems. We show grouping results and hierarchies for 13 representative disease categories. Individual grouping qualities were evaluated using normalized mutual information, adjusted Rand index, and F1-measure, and were found to consistently exhibit great similarity to the existing manual grouping counterpart. The resulting ICD groupings also enjoy comparable interpretability and are well aligned with the current ICD hierarchy. CONCLUSION The proposed approach, by systematically leveraging multiple data sources, is able to overcome bias while maximizing consensus to achieve generalizability. It has the advantage of being efficient, scalable, and adaptive to the evolving human knowledge reflected in the data, showing a significant step toward automating medical knowledge integration.

中文翻译:


通过多视图带状光谱聚类对医疗代码进行自动分组。



目标 随着电子健康记录 (EHR) 的日益广泛采用,表型信息提取的粒度和规模达到了前所未有的水平。然而,医学概念(例如诊断、处方、症状)通常在不同的 EHR 系统中以各种同义词进行描述,这阻碍了信号增强的数据集成,并使知识发现的降维变得复杂。尽管存在现有的本体论和层次结构,但仍需要大量的人力来进行管理和维护——这个过程既不可扩展,又容易受到主观偏见的影响。本文旨在开发一种数据驱动的方法,通过以公正的方式组合多个最新数据源,自动将医学术语分组为临床相关概念。方法我们提出了一种新颖的数据驱动分组方法——多视图带状谱聚类(mvBSC),结合了来自多个医疗保健系统的汇总数据。所提出的方法包括利用现有编码层次结构中的先验知识的分段步骤和在最佳加权矩阵上执行谱聚类的组合步骤。结果我们应用所提出的方法通过集成来自两个医疗保健系统的数据将 ICD-9 和 ICD-10-CM 代码分组在一起。我们展示了 13 种代表性疾病类别的分组结果和层次结构。使用归一化互信息、调整兰德指数和 F1 测量来评估个体分组质量,并发现其始终表现出与现有手动分组对应物的极大相似性。由此产生的 ICD 分组也具有可比的解释性,并且与当前的 ICD 层次结构非常一致。 结论所提出的方法通过系统地利用多个数据源,能够克服偏见,同时最大限度地达成共识以实现普遍性。它的优点是高效、可扩展,并且能够适应数据中反映的不断发展的人类知识,这表明朝着自动化医学知识集成迈出了重要一步。
更新日期:2019-10-28
down
wechat
bug