当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes
Cognitive Computation ( IF 5.4 ) Pub Date : 2020-05-10 , DOI: 10.1007/s12559-020-09714-8
Xiaoshi Zhong , Erik Cambria , Amir Hussain

Time expressions and named entities play important roles in data mining, information retrieval, and natural language processing. However, the conventional position-based tagging schemes (e.g., the BIO and BILOU schemes) that previous research used to model time expressions and named entities suffer from the problem of inconsistent tag assignment. To overcome the problem of inconsistent tag assignment, we designed a new type of tagging schemes to model time expressions and named entities based on their constituents. Specifically, to model time expressions, we defined a constituent-based tagging scheme termed TOMN scheme with four tags, namely T, O, M, and N, indicating the defined constituents of time expressions, namely time token, modifier, numeral, and the words outside time expressions. To model named entities, we defined a constituent-based tagging scheme termed UGTO scheme with four tags, namely U, G, T, and O, indicating the defined constituents of named entities, namely uncommon word, general modifier, trigger word, and the words outside named entities. In modeling, our TOMN and UGTO schemes model time expressions and named entities under conditional random fields with minimal features according to an in-depth analysis for the characteristics of time expressions and named entities. Experiments on diverse datasets demonstrate that our proposed methods perform equally with or more effectively than representative state-of-the-art methods on both time expression extraction and named entity extraction.

中文翻译:

使用基于成分的标记方案提取时间表达式和命名实体

时间表达式和命名实体在数据挖掘,信息检索和自然语言处理中起着重要作用。然而,先前研究用来建模时间表达和命名实体的传统的基于位置的标记方案(例如,BIO和BILOU方案)遭受标签分配不一致的问题。为了克服标签分配不一致的问题,我们设计了一种新型的标记方案,以根据其成分对时间表达式和命名实体进行建模。具体来说,为了对时间表达式建模,我们定义了一个基于成分的标记方案,称为TOMN方案,具有四个标签,即TOMN,表示时间表达式的已定义组成,即时间令牌修饰符数字和时间表达式外部的单词。为了对命名实体建模,我们定义了一个基于成分的标记方案UGTO方案,该方案带有四个标签UGTO,指示了命名实体的已定义成分,即不常用词通用修饰语触发词外面的命名实体。在建模中,根据对时间表达式和命名实体特征的深入分析,我们的TOMN和UGTO方案在具有最小特征的条件随机字段下对时间表达式和命名实体进行建模。在各种数据集上进行的实验表明,在时间表达式提取和命名实体提取方面,我们提出的方法在性能上均与代表性的最新技术相同或比其有效。
更新日期:2020-05-10
down
wechat
bug