Topic Modeling on Triage Notes With Semiorthogonal Nonnegative Matrix Factorization,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Topic Modeling on Triage Notes With Semiorthogonal Nonnegative Matrix Factorization
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2021-02-03 , DOI: 10.1080/01621459.2020.1862667
Yutong Li ₁ , Ruoqing Zhu ₁ , Annie Qu ₂ , Han Ye ₃ , Zhankun Sun ₄

Affiliation

Abstract

Emergency department (ED) crowding is a universal health issue that affects the efficiency of hospital management and patient care quality. ED crowding frequently occurs when a request for a ward-bed for a patient is delayed until a doctor makes an admission decision. In this case study, we build a classifier to predict the disposition of patients using manually typed nurse notes collected during triage as provided by the Alberta Medical Center. These predictions can potentially be incorporated to early bed coordination and fast track streaming strategies to alleviate overcrowding and waiting times in the ED. However, these triage notes involve high dimensional, noisy, and sparse text data, which make model-fitting and interpretation difficult. To address this issue, we propose a novel semiorthogonal nonnegative matrix factorization for both continuous and binary predictors to reduce the dimensionality and derive word topics. The triage notes can then be interpreted as a non-subtractive linear combination of orthogonal basis topic vectors. Our real data analysis shows that the triage notes contain strong predictive information toward classifying the disposition of patients for certain medical complaints, such as altered consciousness or stroke. Additionally, we show that the document-topic vectors generated by our method can be used as features to further improve classification accuracy by up to 1% across different medical complaints, for example, 74.3%–75.3% accuracy for patients with stroke symptoms. This improvement could be clinically impactful for certain patients, especially when the scale of hospital patients is large. Furthermore, the generated word-topic vectors provide a bi-clustering interpretation under each topic due to the orthogonal formulation, which can be beneficial for hospitals in better understanding the symptoms and reasons behind patients’ visits. Supplementary materials for this article are available online.

中文翻译：

使用半正交非负矩阵分解的分类注释主题建模

摘要

急诊科 (ED) 拥挤是影响医院管理效率和患者护理质量的普遍健康问题。当为患者提供病床的请求被延迟到医生做出入院决定时，就会经常发生 ED 拥挤。在本案例研究中，我们构建了一个分类器，以使用阿尔伯塔医疗中心提供的分诊过程中收集的手动输入护士笔记来预测患者的处置。这些预测可能会被纳入早期床位协调和快速通道流策略，以缓解急诊室的过度拥挤和等待时间。然而，这些分类注释涉及高维、嘈杂和稀疏的文本数据，这使得模型拟合和解释变得困难。为了解决这个问题，我们为连续和二元预测变量提出了一种新的半正交非负矩阵分解，以降低维度并导出单词主题。然后可以将分类注释解释为正交基主题向量的非减法线性组合。我们的真实数据分析表明，分类记录包含强大的预测信息，可用于对患者的某些医疗投诉（例如意识改变或中风）的处置进行分类。此外，我们表明，由我们的方法生成的文档主题向量可用作特征，以进一步将不同医疗投诉的分类准确度提高多达 1%，例如，对中风症状患者的准确度为 74.3%–75.3%。这种改善可能对某些患者具有临床影响，尤其是在住院病人规模较大的情况下。此外，由于正交公式，生成的词主题向量在每个主题下提供了双聚类解释，这有助于医院更好地了解患者就诊背后的症状和原因。本文的补充材料可在线获取。

更新日期：2021-02-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11