当前位置: X-MOL 学术Int. J. Med. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated ICD coding for primary diagnosis via clinically interpretable machine learning
International Journal of Medical Informatics ( IF 4.9 ) Pub Date : 2021-07-27 , DOI: 10.1016/j.ijmedinf.2021.104543
Xiaolin Diao 1 , Yanni Huo 1 , Shuai Zhao 1 , Jing Yuan 2 , Meng Cui 3 , Yuxin Wang 1 , Xiaodan Lian 1 , Wei Zhao 4
Affiliation  

Background

Computer-assisted clinical coding (CAC) based on automated coding algorithms has been expected to improve the International Classification of Disease, tenth version (ICD-10) coding quality and productivity, whereas studies oriented to primary diagnosis auto-coding are limited in the Chinese context.

Objective

This study aims at developing a machine learning (ML) model for automated primary diagnosis ICD-10 coding.

Methods

A total of 71,709 admissions in Fuwai hospital were included to carry out this study, corresponding to 168 primary diagnosis ICD-10 codes. Based on clinical implications, two feature engineering methods were used to process discharge diagnosis and procedure texts into sequential features and sequential grouping features respectively by which two kinds of models were built and compared. One baseline model using one-hot encoding features was considered. Light Gradient Boosting Machine (LightGBM) was adopted as the classifier, and grid search and cross-validation were used to select the optimal hyperparameters. SHapley Additive exPlanations (SHAP) values were applied to give the interpretability of models.

Results

Our best prediction model was developed based on sequential grouping features. It showed good performance in the test phase with accuracy and macro-averaged F1 (Macro-F1) of 95.2% and 88.3% respectively. The comparison of the models demonstrated the effectiveness of the sequential information and the grouping strategy in boosting model performance (P-value < 0.01). Subgroup analysis of the best model on each individual code manifested that 91.1% of the codes achieved the F1 over 70.0%.

Conclusions

Our model has been demonstrated its effectiveness for automated primary diagnosis coding in the Chinese context and its results are interpretable. Hence, it has the potential to assist clinical coders to improve coding efficiency and quality in Chinese inpatient settings.

更新日期:2021-08-12
down
wechat
bug