当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed representations of diseases based on co-occurrence relationship
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2021-06-20 , DOI: 10.1016/j.eswa.2021.115418
Haoqing Wang , Huiyu Mai , Zhi-hong Deng , Chao Yang , Luxia Zhang , Huai-yu Wang

The co-occurrence relationship among diseases facilitates the knowledge discovery in the medical field. However, due to limited data, previous researches are mainly based on clinician experience and simple statistics which make it difficult to discover deep associations among diseases. Treating the diagnoses in an electronic medical record (EMR) as interrelated random variables, we use Markov random fields to model the co-occurrence relationship among diseases and propose Di2Vec to learn distributed representations of diseases. The diseases having high co-occurrence frequency will be very close to each other in the embedding space. Considering the hierarchical structure in each diagnosis code, we introduce the subword embedding and explore its impact on the quality of embeddings, where the embedding of each diagnosis is expressed as the sum of its subword embedding. Qualitative and Quantitative experiments show that our Di2Vec can make the embeddings of diseases with high co-occurrence frequency close to each other, and can also outperform Skip-gram and CBOW when use these embeddings as the feature representations for medical expense prediction. Using subword embedding will make the disease embeddings to have better clustering property, but to a certain extent, it loss the co-occurrence information contained in the disease embeddings.



中文翻译:

基于共现关系的疾病分布式表示

疾病之间的共现关系促进了医学领域的知识发现。然而,由于数据有限,以往的研究主要基于临床医生的经验和简单的统计数据,难以发现疾病之间的深层关联。将电子病历 (EMR) 中的诊断视为相互关联的随机变量,我们使用马尔可夫随机场对疾病之间的共现关系进行建模,并提出 Di2Vec 来学习疾病的分布式表示。具有高共现频率的疾病将在嵌入空间中彼此非常接近。考虑到每个诊断代码中的层次结构,我们引入了子词嵌入并探讨了它对嵌入质量的影响,其中每个诊断的嵌入表示为其子词嵌入的总和。定性和定量实验表明,我们的 Di2Vec 可以使高共现频率的疾病的嵌入彼此接近,并且在使用这些嵌入作为医疗费用预测的特征表示时也可以优于 Skip-gram 和 CBOW。使用子词嵌入会使疾病嵌入具有更好的聚类特性,但在一定程度上会丢失疾病嵌入中包含的共现信息。当使用这些嵌入作为医疗费用预测的特征表示时,也可以胜过 Skip-gram 和 CBOW。使用子词嵌入会使疾病嵌入具有更好的聚类特性,但在一定程度上会丢失疾病嵌入中包含的共现信息。当使用这些嵌入作为医疗费用预测的特征表示时,也可以胜过 Skip-gram 和 CBOW。使用子词嵌入会使疾病嵌入具有更好的聚类特性,但在一定程度上会丢失疾病嵌入中包含的共现信息。

更新日期:2021-06-23
down
wechat
bug