当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2019-04-06 , DOI: 10.1007/s10579-019-09445-9
Deniz Zeyrek , Amália Mendes , Yulia Grishina , Murathan Kurfalı , Samuel Gibbon , Maciej Ogrodniczuk

TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks—which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.

中文翻译:

TED多语言话语银行(TED-MDB):以PDTB风格注释的平行语料库

TED-多语种话务银行(TED-MDB)是一种多语种资源,按照PDTB的宗旨和原则,在话语级别以6种语言(英语,波兰语,德语,俄语,欧洲葡萄牙语和土耳其语)对TED对话进行注释。我们解释了语料库设计标准,该标准具有三个主要特征:所涉及语言的语言特征,TED演讲的互动性质(导致我们注释Hypophora)以及避免投影的决定。我们报告了注释一致性和注释后对齐实验,并基于语料库统计数据提供了跨语言的比较。
更新日期:2019-04-06
down
wechat
bug