当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multigranularity Label Prediction Model for Automatic International Classification of Diseases Coding in Clinical Text.
Journal of Computational Biology ( IF 1.7 ) Pub Date : 2023-07-31 , DOI: 10.1089/cmb.2023.0096
Ying Yu 1, 2 , Tian Qiu 1 , Junwen Duan 1 , Jianxin Wang 1

International Classification of Diseases (ICD) serves as the foundation for generating comparable global disease statistics across regions and over time. The process of ICD coding involves assigning codes to diseases based on clinical notes, which can describe a patient's condition in a standard way. However, this process is complicated by the vast number of codes and the intricate taxonomy of ICD codes, which are hierarchically organized into various levels, including chapter, category, subcategory, and its subdivisions. Many existing studies focus solely on predicting subcategory codes, ignoring the hierarchical relationships among codes. To address this limitation, we propose a multitask learning model that trains multiple classifiers for different code levels, while also capturing the relations between coarser and finer-grained labels through a reinforcement mechanism. Our approach is evaluated on both English and Chinese benchmark dataset, and we demonstrate that our method achieves competitive performance with baseline models, particularly in terms of macro-F1 results. These findings suggest that our approach effectively leverages the hierarchical structure of ICD codes to improve disease code prediction accuracy. Analysis of attention mechanism shows that multigranularity attention of our model captures crucial feature of input text on different granularity levels, which can provide reasonable explanations for the prediction results.



国际疾病分类 (ICD) 是生成跨地区和不同时间段的可比较的全球疾病统计数据的基础。ICD编码的过程涉及根据临床记录为疾病分配代码,这可以以标准方式描述患者的病情。然而,由于代码数量巨大,ICD 代码的分类也很复杂,这个过程变得很复杂,这些代码被分层组织成不同的级别,包括章节、类别、子类别及其细分。许多现有研究仅关注于预测子类别代码,而忽略了代码之间的层次关系。为了解决这个限制,我们提出了一种多任务学习模型,该模型可以为不同的代码级别训练多个分类器,同时还通过强化机制捕获粗粒度和细粒度标签之间的关系。我们的方法在英语和中文基准数据集上进行了评估,并且我们证明了我们的方法在基线模型上实现了有竞争力的性能,特别是在宏观 F1 结果方面。这些发现表明我们的方法有效地利用了 ICD 代码的层次结构来提高疾病代码预测的准确性。对注意力机制的分析表明,我们模型的多粒度注意力捕获了不同粒度级别上输入文本的关键特征,这可以为预测结果提供合理的解释。