当前位置: X-MOL 学术J. Psychiatr. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparisons of different classification algorithms while using text mining to screen psychiatric inpatients with suicidal behaviors
Journal of Psychiatric Research ( IF 4.8 ) Pub Date : 2020-02-22 , DOI: 10.1016/j.jpsychires.2020.02.019
H. Zhu , X. Xia , J. Yao , H. Fan , Q. Wang , Q. Gao


To compare the performance of methods based on text mining to screen suicidal behaviors according to chief complaint of the psychiatric inpatients.


Electronic Medical Records of inpatients with mental disorders were collected. Text mining method was adopted to screen suicidal behaviors. The performances of different combinations of six algorithms and two term weighting factors were compared under various training set sizes, which were assessed by precision, recall, F1-value and accuracy.


A total of 3600 psychiatric inpatients (1800 with suicidal behaviors and 1800 without suicidal behaviors) were included in this study. In chief complaints of suicidal inpatients, “suicide”, “notion” and “suspicion” were the commonest statements, appearing 1228, 705 and 638 times respectively. In contrast, “excitement”, “instability” and “impulsion” appeared more frequently in chief complaints of patients without suicidal behaviors (599, 599, 534 times respectively). The performance of each algorithm was generally improved with the increasing training set sizes and tended to be stable when the number of training cases reached 1000, where most of them could achieve satisfactory accuracy values (>0.95). Results of testing set showed that SVM, Random Forest and AdaBoost weighted by TF had better generalization ability. The F1 values were 0.9889 for SVM, 0.9838 for random forest and 0.9828 for AdaBoost, respectively.


This study confirmed the feasibility of filtering suicidal inpatients with small amounts of representative terms. SVM, Random Forest and AdaBoost weighted by TF have better performance in this task. Our findings provided a practical way to automatically classify patients with or without suicidal behaviors before admission to hospital, which potentially led to considerable savings in time and human resources for identification of high-risk patients and suicide prevention.








这项研究总共包括3600名精神科住院病人(1800名有自杀行为,1800名没有自杀行为)。在自杀性住院病人的主要投诉中,“自杀”,“观念”和“怀疑”是最常见的陈述,分别出现了1228、705和638次。相比之下,在没有自杀行为的患者的主诉中,“兴奋”,“不稳定”和“冲动”的出现频率更高(分别为599、599和534次)。每种算法的性能通常随着训练集大小的增加而提高,并且在训练案例数达到1000时趋于稳定,其中大多数案例都能达到令人满意的精度值(> 0.95)。测试集结果表明,用TF加权的SVM,Random Forest和AdaBoost具有更好的泛化能力。F1值为0。


