当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients’ disease history
Journal of Intelligent Information Systems ( IF 2.3 ) Pub Date : 2020-04-28 , DOI: 10.1007/s10844-020-00605-w
Ayoub Bagheri , Arjan Sammani , Peter G. M. van der Heijden , Folkert W. Asselbergs , Daniel L. Oberski

Given the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.

中文翻译:

ETM:通过主题建模丰富自动临床句子分类以检测患者的病史

鉴于在医学领域以数字方式收集文本数据的速度很快,因此越来越需要能够分析临床记录并将其句子分类到电子健康记录 (EHR) 中的自动化工具。本研究使用 EHR 文本从临床句子中检测患者的病史。然而,在 EHR 中,句子比一般领域的句子更不聚焦主题且更短,这导致共现模式的稀疏性和语义特征的缺乏。为了应对这一挑战,当前的临床句子分类方法依赖于外部信息来提高分类性能。然而,由于缺乏通用医学词典,这是难以置信的。本研究提出了基于潜在狄利克雷分配的 ETM(主题建模丰富)算法,平滑短句的语义表示。ETM 通过将无监督算法生成的概率分布合并到文本表示中来丰富文本表示。它通过使用内部知识获取程序来考虑原始文本的长度以增强表示。在临床预测建模方面,可解释性提高了模型的接受度。因此,对于临床句子分类,ETM 方法采用初始 TFiDF(词频逆文档频率)表示,其中我们使用支持向量机和神经网络算法进行分类任务。我们对由来自荷兰的临床心血管笔记组成的数据集进行了三组实验,以测试所提出方法与流行方法相比的句子分类性能。结果表明,所提出的 ETM 方法优于最先进的基线。
更新日期:2020-04-28
down
wechat
bug