当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Finnish news corpus for named entity recognition
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2019-08-01 , DOI: 10.1007/s10579-019-09471-7
Teemu Ruokolainen , Pekka Kauppinen , Miikka Silfverberg , Krister Lindén

We present a corpus of Finnish news articles with a manually prepared named entity annotation. The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event, and date). The articles are extracted from the archives of Digitoday, a Finnish online technology news source. The corpus is available for research purposes. We present baseline experiments on the corpus using a rule-based and two deep learning systems on two, in-domain and out-of-domain, test sets.

中文翻译:

用于命名实体识别的芬兰新闻语料库

我们提供了芬兰新闻报道的语料库,其中包含手动准备的命名实体注释。语料库由953个文章(193,742个单词标记)组成,具有六个命名的实体类(组织,位置,人员,产品,事件和日期)。文章摘自芬兰在线技术新闻来源Digitoday的档案。语料库可用于研究目的。我们在语料库上使用基于规则的和两个深度学习系统在两个域内和域外测试集上提供基准实验。
更新日期:2019-08-01
down
wechat
bug