The Medical Scribe: Corpus Development and Model Performance Analyses,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Medical Scribe: Corpus Development and Model Performance Analyses
arXiv - CS - Computation and Language Pub Date : 2020-03-12 , DOI: arxiv-2003.11531
Izhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yuhui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau, and Justin S. Paul

There is a growing interest in creating tools to assist in clinical note generation using the audio of provider-patient encounters. Motivated by this goal and with the help of providers and medical scribes, we developed an annotation scheme to extract relevant clinical concepts. We used this annotation scheme to label a corpus of about 6k clinical encounters. This was used to train a state-of-the-art tagging model. We report ontologies, labeling results, model performances, and detailed analyses of the results. Our results show that the entities related to medications can be extracted with a relatively high accuracy of 0.90 F-score, followed by symptoms at 0.72 F-score, and conditions at 0.57 F-score. In our task, we not only identify where the symptoms are mentioned but also map them to canonical forms as they appear in the clinical notes. Of the different types of errors, in about 19-38% of the cases, we find that the model output was correct, and about 17-32% of the errors do not impact the clinical note. Taken together, the models developed in this work are more useful than the F-scores reflect, making it a promising approach for practical applications.

中文翻译：

医学抄写员：语料库开发和模型性能分析

越来越多的人对创建工具以使用提供者与患者会面的音频来协助生成临床记录的兴趣日益浓厚。在此目标的推动下，在提供者和医学抄写员的帮助下，我们开发了一个注释方案来提取相关的临床概念。我们使用这个注释方案来标记大约 6k 临床遭遇的语料库。这用于训练最先进的标记模型。我们报告本体、标记结果、模型性能和结果的详细分析。我们的结果表明，与药物相关的实体可以以 0.90 F-score 的相对较高的准确度提取，其次是 0.72 F-score 的症状和 0.57 F-score 的条件。在我们的任务中，我们不仅确定在哪里提到症状，而且还将它们映射到临床记录中出现的规范形式。在不同类型的错误中，在大约 19-38% 的案例中，我们发现模型输出是正确的，大约 17-32% 的错误不会影响临床记录。总之，这项工作中开发的模型比 F 分数反映的更有用，使其成为实际应用的一种有前途的方法。

更新日期：2020-03-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文