当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
COVID-19: Comparative Analysis of Methods for Identifying Articles Related to Therapeutics and Vaccines without Using Labeled Data
arXiv - CS - Information Retrieval Pub Date : 2021-01-05 , DOI: arxiv-2101.02017
Mihir Parmar, Ashwin Karthik Ambalavanan, Hong Guan, Rishab Banerjee, Jitesh Pabla, Murthy Devarakonda

Here we proposed an approach to analyze text classification methods based on the presence or absence of task-specific terms (and their synonyms) in the text. We applied this approach to study six different transfer-learning and unsupervised methods for screening articles relevant to COVID-19 vaccines and therapeutics. The analysis revealed that while a BERT model trained on search-engine results generally performed well, it miss-classified relevant abstracts that did not contain task-specific terms. We used this insight to create a more effective unsupervised ensemble.

中文翻译:

COVID-19:在不使用标签数据的情况下鉴定与治疗和疫苗相关的文章的方法的比较分析

在这里,我们提出了一种基于文本中是否存在特定于任务的术语(及其同义词)来分析文本分类方法的方法。我们应用此方法研究了六种不同的转移学习和无监督方法,以筛选与COVID-19疫苗和治疗剂有关的文章。分析显示,虽然对搜索引擎结果进行训练的BERT模型通常表现良好,但它对未包含特定于任务的术语的相关摘要进行了误分类。我们利用这种见解创建了一个更有效的无监督合奏。
更新日期:2021-01-07
down
wechat
bug