当前位置: X-MOL 学术J. Assoc. Inf. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Understanding the stability of medical concept embeddings
Journal of the Association for Information Science and Technology ( IF 2.8 ) Pub Date : 2020-10-02 , DOI: 10.1002/asi.24411
Grace E. Lee 1 , Aixin Sun 1
Affiliation  

Frequency is one of the major factors for training quality word embeddings. Several work has recently discussed the stability of word embeddings in general domain and suggested factors influencing the stability. In this work, we conduct a detailed analysis on the stability of concept embeddings in medical domain, particularly the relation with concept frequency. The analysis reveals the surprising high stability of low-frequency concepts: low-frequency ( 1000) concepts. To develop a deeper understanding of this finding, we propose a new factor, the noisiness of context words, which influences the stability of medical concept embeddings, regardless of frequency. We evaluate the proposed factor by showing the linear correlation with the stability of medical concept embeddings. The correlations are clear and consistent with various groups of medical concepts. Based on the linear relations, we make suggestions on ways to adjust the noisiness of context words for the improvement of stability. Finally, we demonstrate that the proposed factor extends to the word embedding stability in general domain.

中文翻译:

理解医学概念嵌入的稳定性

频率是训练高质量词嵌入的主要因素之一。最近有几项工作讨论了词嵌入在一般领域的稳定性,并提出了影响稳定性的因素。在这项工作中,我们对医学领域中概念嵌入的稳定性进行了详细分析,特别是与概念频率的关系。分析揭示了低频概念惊人的高稳定性:低频(1000)概念。为了更深入地理解这一发现,我们提出了一个新因素,即上下文词的噪声,它会影响医学概念嵌入的稳定性,无论频率如何。我们通过显示与医学概念嵌入稳定性的线性相关性来评估所提出的因素。相关性是明确的,并且与各种医学概念组一致。基于线性关系,我们提出了调整上下文词噪声的方法以提高稳定性。最后,我们证明了所提出的因素扩展到一般领域的词嵌入稳定性。
更新日期:2020-10-02
down
wechat
bug