当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pre-training phenotyping classifiers
Journal of Biomedical informatics ( IF 4.0 ) Pub Date : 2020-11-28 , DOI: 10.1016/j.jbi.2020.103626
Dmitriy Dligach 1 , Majid Afshar 2 , Timothy Miller 3
Affiliation  

Recent transformer-based pre-trained language models have become a de facto standard for many text classification tasks. Nevertheless, their utility in the clinical domain, where classification is often performed at encounter or patient level, is still uncertain due to the limitation on the maximum length of input. In this work, we introduce a self-supervised method for pre-training that relies on a masked token objective and is free from the limitation on the maximum input length. We compare the proposed method with supervised pre-training that uses billing codes as a source of supervision. We evaluate the proposed method on one publicly-available and three in-house datasets using the standard evaluation metrics such as the area under the ROC curve and F1 score. We find that, surprisingly, even though self-supervised pre-training performs slightly worse than supervised, it still preserves most of the gains from pre-training.



中文翻译:

预训练表型分类器

最近基于转换器的预训练语言模型已成为许多文本分类任务的事实上的标准。然而,由于输入的最大长度的限制,它们在临床领域的效用仍然不确定,其中分类通常在遭遇或患者级别进行。在这项工作中,我们引入了一种自我监督的预训练方法,该方法依赖于掩码令牌目标,并且不受最大输入长度的限制。我们将所提出的方法与使用计费代码作为监督源的监督预训练进行比较。我们使用标准评估指标(例如 ROC 曲线下的面积和 F1 分数)在一个公开数据集和三个内部数据集上评估所提出的方法。我们惊奇地发现,

更新日期:2020-12-01
down
wechat
bug