Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?,The American Journal of Emergency Medicine

当前位置： X-MOL 学术 › Am. J. Emerg. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?
The American Journal of Emergency Medicine ( IF 3.6 ) Pub Date : 2024-02-07 , DOI: 10.1016/j.ajem.2024.02.008
Arian Zaboli , Francesco Brigo , Serena Sibilio , Michael Mian , Gianni Turcato

Chat-GPT is rapidly emerging as a promising and potentially revolutionary tool in medicine. One of its possible applications is the stratification of patients according to the severity of clinical conditions and prognosis during the triage evaluation in the emergency department (ED). Using a randomly selected sample of 30 vignettes recreated from real clinical cases, we compared the concordance in risk stratification of ED patients between healthcare personnel and Chat-GPT. The concordance was assessed with Cohen's kappa, and the performance was evaluated with the area under the receiver operating characteristic curve (AUROC) curves. Among the outcomes, we considered mortality within 72 h, the need for hospitalization, and the presence of a severe or time-dependent condition. The concordance in triage code assignment between triage nurses and Chat-GPT was 0.278 (unweighted Cohen's kappa; 95% confidence intervals: 0.231–0.388). For all outcomes, the ROC values were higher for the triage nurses. The most relevant difference was found in 72-h mortality, where triage nurses showed an AUROC of 0.910 (0.757–1.000) compared to only 0.669 (0.153–1.000) for Chat-GPT. The current level of Chat-GPT reliability is insufficient to make it a valid substitute for the expertise of triage nurses in prioritizing ED patients. Further developments are required to enhance the safety and effectiveness of AI for risk stratification of ED patients.

中文翻译：

人类智能与 Chat-GPT：谁在正确分类患者分类方面表现更好？

Chat-GPT 正在迅速成为医学领域一种有前景且具有潜在革命性的工具。其可能的应用之一是在急诊科 (ED) 分诊评估过程中根据临床病情的严重程度和预后对患者进行分层。我们使用从真实临床病例中重新创建的随机选择的 30 个小插图样本，比较了医护人员和 Chat-GPT 之间 ED 患者风险分层的一致性。使用 Cohen's kappa 评估一致性，并使用受试者工作特征曲线 (AUROC) 曲线下面积评估性能。在结果中，我们考虑了 72 小时内的死亡率、住院的需要以及是否存在严重或时间依赖性病症。分诊护士和 Chat-GPT 之间的分诊代码分配一致性为 0.278（未加权 Cohen's kappa；95% 置信区间：0.231–0.388）。对于所有结果，分诊护士的 ROC 值较高。最相关的差异出现在 72 小时死亡率中，分诊护士的 AUROC 为 0.910 (0.757–1.000)，而 Chat-GPT 的 AUROC 仅为 0.669 (0.153–1.000)。目前 Chat-GPT 的可靠性水平不足以有效替代分诊护士在优先考虑 ED 患者方面的专业知识。需要进一步发展以提高人工智能对 ED 患者风险分层的安全性和有效性。

更新日期：2024-02-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>