Comparisons of different classification algorithms while using text mining to screen psychiatric inpatients with suicidal behaviors,Journal of Psychiatric Research

当前位置： X-MOL 学术 › J. Psychiatr. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comparisons of different classification algorithms while using text mining to screen psychiatric inpatients with suicidal behaviors
Journal of Psychiatric Research ( IF 4.8 ) Pub Date : 2020-02-22 , DOI: 10.1016/j.jpsychires.2020.02.019
H. Zhu , X. Xia , J. Yao , H. Fan , Q. Wang , Q. Gao

Objective

To compare the performance of methods based on text mining to screen suicidal behaviors according to chief complaint of the psychiatric inpatients.

Methods

Electronic Medical Records of inpatients with mental disorders were collected. Text mining method was adopted to screen suicidal behaviors. The performances of different combinations of six algorithms and two term weighting factors were compared under various training set sizes, which were assessed by precision, recall, F1-value and accuracy.

Results

A total of 3600 psychiatric inpatients (1800 with suicidal behaviors and 1800 without suicidal behaviors) were included in this study. In chief complaints of suicidal inpatients, “suicide”, “notion” and “suspicion” were the commonest statements, appearing 1228, 705 and 638 times respectively. In contrast, “excitement”, “instability” and “impulsion” appeared more frequently in chief complaints of patients without suicidal behaviors (599, 599, 534 times respectively). The performance of each algorithm was generally improved with the increasing training set sizes and tended to be stable when the number of training cases reached 1000, where most of them could achieve satisfactory accuracy values (>0.95). Results of testing set showed that SVM, Random Forest and AdaBoost weighted by TF had better generalization ability. The F1 values were 0.9889 for SVM, 0.9838 for random forest and 0.9828 for AdaBoost, respectively.

Conclusion

This study confirmed the feasibility of filtering suicidal inpatients with small amounts of representative terms. SVM, Random Forest and AdaBoost weighted by TF have better performance in this task. Our findings provided a practical way to automatically classify patients with or without suicidal behaviors before admission to hospital, which potentially led to considerable savings in time and human resources for identification of high-risk patients and suicide prevention.

中文翻译：

使用文本挖掘对患有自杀行为的精神病患者进行筛查时，不同分类算法的比较

目的

根据精神病患者的主要抱怨，比较基于文本挖掘的方法筛查自杀行为的性能。

方法

收集了精神疾病住院患者的电子病历。采用文本挖掘的方法来筛选自杀行为。比较了在不同训练集大小下六种算法和两个项加权因子的不同组合的性能，并通过精度，召回率，F1值和准确性进行了评估。

结果

这项研究总共包括3600名精神科住院病人（1800名有自杀行为，1800名没有自杀行为）。在自杀性住院病人的主要投诉中，“自杀”，“观念”和“怀疑”是最常见的陈述，分别出现了1228、705和638次。相比之下，在没有自杀行为的患者的主诉中，“兴奋”，“不稳定”和“冲动”的出现频率更高（分别为599、599和534次）。每种算法的性能通常随着训练集大小的增加而提高，并且在训练案例数达到1000时趋于稳定，其中大多数案例都能达到令人满意的精度值（> 0.95）。测试集结果表明，用TF加权的SVM，Random Forest和AdaBoost具有更好的泛化能力。F1值为0。