当前位置: X-MOL 学术J. Intell. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Annotador: a temporal tagger for Spanish
Journal of Intelligent & Fuzzy Systems ( IF 1.7 ) Pub Date : 2020-06-29 , DOI: 10.3233/jifs-179865
María Navas-Loro 1 , Víctor Rodríguez-Doncel 1
Affiliation  

Temporal information is crucial in knowledge extraction. Being able to locate events in a timeline is necessary to understand the narrative behind every text. To this aim, several temporal taggers have been proposed in literature –nevertheless, not all languages received the same attention. Most taggers work only for English texts, and not many have been developed for other languages. Also the scarcity of annotated corpora in other languages notably hinders the task. In this paper we present a new rule-based tagger called Annotador (Añotador in Spanish) able to process texts both in Spanish and English. Furthermore, a new corpus with more than 300 short texts containing common temporal expressions, called the HourGlass corpus, has been built in order to test it and to facilitate the development of new resources and tools. Professionals from different domains intervened in the gathering of the text, making it heterogeneous and easy to use thanks to the tags added to each entry. Finally, we analyzed main challenges in the time expression extraction task.

中文翻译:

Annotador:西班牙语的时间标记器

时间信息对于知识的提取至关重要。要了解每个文本背后的叙述,必须能够在时间轴上定位事件。为了达到这个目的,在文学中已经提出了几种时间标记器,尽管如此,并不是所有的语言都得到了同样的关注。大多数标记器仅适用于英语文本,而针对其他语言的标记器却很少。同样,其他语言中带注释的语料库的匮乏也显着阻碍了这项任务。在本文中,我们介绍了一种新的基于规则的标记器,称为Annotador(西班牙语中的Añotador),能够处理西班牙语和英语中的文本。此外,为了测试它并促进新资源和工具的开发,已经建立了一个新的语料库,该语料库包含300多个包含常见时态表达的短文本,称为HourGlass语料库。来自不同领域的专业人员干预了文本的收集,这要归功于添加到每个条目中的标签,从而使其变得异构且易于使用。最后,我们分析了时间表达提取任务中的主要挑战。
更新日期:2020-06-30
down
wechat
bug