当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Domain bias in distinguishing Flemish and Dutch subtitles
Natural Language Engineering ( IF 2.3 ) Pub Date : 2019-08-15 , DOI: 10.1017/s1351324919000445
Hans van Halteren

This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch–Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and then investigated how the relation between training and test domains influences the recognition quality. I show that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another. Apart from the quantitative results, I also present a qualitative analysis, by investigating in detail the most distinguishing features in the various scenarios. Here too, it is with the out-of-domain recognition that some genuine differences between Flemish and Netherlandic Dutch can be found.

中文翻译:

区分佛兰芒语和荷兰语字幕的领域偏见

本文描述了我试图区分佛兰芒语和荷兰语荷兰语字幕的实验,正如最初在 VarDial 2018 Dutch–Flemish Subtitle task 中提出的那样。然而,我没有将所有数据用作一个整体,而是将它们分成两个不重叠的域,然后研究训练域和测试域之间的关系如何影响识别质量。我表明,当在一个领域进行训练并在另一个领域进行测试时,可以得出对语言变体可识别性水平的最佳估计。除了定量结果外,我还通过详细调查各种场景中最显着的特征来进行定性分析。在这里,也可以通过域外识别来发现佛兰芒语和荷兰语荷兰语之间的一些真正差异。
更新日期:2019-08-15
down
wechat
bug