当前位置: X-MOL 学术J. Assoc. Inf. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
Journal of the Association for Information Science and Technology ( IF 3.5 ) Pub Date : 2020-08-19 , DOI: 10.1002/asi.24399
Mª Luisa Díez Platas 1 , Salvador Ros Muñoz 1 , Elena González‐Blanco 2 , Pablo Ruiz Fabo 1 , Elena Álvarez Mellado 1
Affiliation  

The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper‐noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity‐type‐specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.

中文翻译:

基于上下文信息的中世纪西班牙(12-15世纪)命名实体识别与属性标注系统

西班牙中世纪文本中命名实体的识别具有极大的复杂性,涉及特定的挑战:首先,中世纪文本专有名词使用的复杂形态句法特征。其次,缺乏严格的拼写标准。最后,西班牙语从 12 世纪到 15 世纪的历时和地理变化。在这一时期,命名实体通常表现为复杂的文本结构。例如,经常添加关于个人在社会中的角色和地理起源的昵称和信息。为了解决这种复杂性,已实施命名实体识别和分类系统。系统使用基于语义的上下文线索来检测实体并分配类型。鉴于具有附加属性的实体的出现,实体上下文也被解析以确定这些属性的实体类型特定的依赖关系。此外,它使用变体生成器从语音和形态句法的角度处理西班牙中世纪术语的历时演变。该工具迭代地丰富其适当的词典、词典和地名词典。该系统在超过 3,000 个不同类型和时期的手动注释实体的语料库上进行了评估,获得了介于 0.74 和 0.87 之间的 F1 分数。对人员和角色名称属性的属性注释进行了评估,总体 F1 为 0.75。该系统在超过 3,000 个不同类型和时期的手动注释实体的语料库上进行了评估,获得了介于 0.74 和 0.87 之间的 F1 分数。对人员和角色名称属性的属性注释进行了评估,总体 F1 为 0.75。该系统在超过 3,000 个不同类型和时期的手动注释实体的语料库上进行了评估,获得了介于 0.74 和 0.87 之间的 F1 分数。对人员和角色名称属性的属性注释进行了评估,总体 F1 为 0.75。
更新日期:2020-08-19
down
wechat
bug