当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Writer’s uncertainty identification in scientific biomedical articles: a tool for automatic if-clause tagging
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2020-06-11 , DOI: 10.1007/s10579-020-09491-8
Paolo Omero , Massimiliano Valotto , Riccardo Bellana , Ramona Bongelli , Ilaria Riccioni , Andrzej Zuczkowski , Carlo Tasso

In a previous study, we manually identified seven categories (verbs, non-verbs, modal verbs in the simple present, modal verbs in the conditional mood, if, uncertain questions, and epistemic future) of Uncertainty Markers (UMs) in a corpus of 80 articles from the British Medical Journal randomly sampled from a 167-year period (1840–2007). The UMs detected on the base of an epistemic stance approach were those referring only to the authors of the articles and only in the present. We also performed preliminary experiments to assess the manual annotated corpus and to establish a baseline for the UMs automatic detection. The results of the experiments showed that most UMs could be recognized with good accuracy, except for the if-category, which includes four subcategories: if-clauses in a narrow sense; if-less clauses; as if/as though; if and whether introducing embedded questions. The unsatisfactory results concerning the if-category were probably due to both its complexity and the inadequacy of the detection rules, which were only lexical, not grammatical. In the current article, we describe a different approach, which combines grammatical and syntactic rules. The performed experiments show that the identification of uncertainty in the if-category has been largely double improved compared to our previous results. The complex overall process of uncertainty detection can greatly profit from a hybrid approach which should combine supervised Machine learning techniques with a knowledge-based approach constituted by a rule-based inference engine devoted to the if-clause case and designed on the basis of the above mentioned epistemic stance approach.



中文翻译:

科学生物医学文章中作者的不确定性识别:if-clause自动标记工具

在先前的研究中,我们手动确定了语料库中不确定性标记(UM)的七个类别(动词,非动词,简单现在的情态动词,条件语气中的情态动词(如果,不确定的问题和认知的未来))。从《英国医学杂志》上随机抽取了167年(1840-2007年)的80篇文章。根据认知立场检测到的UM这种方法是那些仅指文章作者并且仅指当前的方法。我们还进行了初步实验,以评估手动注释的语料库,并为UMs自动检测建立基线。实验结果表明,除了if类别(包括四个子类别)外,大多数UM都可以以较高的精度被识别:狭义的if子句;if-less子句;好像/好像 是否以及是否引入嵌入式问题。关于if类别的结果不令人满意,可能是由于它的复杂性和检测规则的不足而引起的,而这些检测规则只是词汇上的,不是语法上的。在当前的文章中,我们描述了一种不同的方法,该方法结合了语法规则和句法规则。进行的实验表明,与我们之前的结果相比,if类别中不确定性的识别已大大提高了两倍。不确定性检测的复杂整个过程可以从一种混合方法中受益匪浅,该方法应将监督的机器学习技术与基于知识的方法相结合,该方法由专门针对if子句的基于规则的推理引擎构成,并基于上述方法进行设计提到了认知立场方法。

更新日期:2020-07-24
down
wechat
bug