当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The B-Subtle framework: tailoring subtitles to your needs
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2020-10-11 , DOI: 10.1007/s10579-020-09507-3
Miguel Ventura , Jessica Veiga , Luisa Coheur , Sandra Gama

Large amounts of subtitles, from movies and TV shows, can be easily found on the web, for free, in almost every language. Several corpora, built from subtitles, with different annotations and purposes, are currently available. Considering that new sets of subtitles are constantly being released, we propose B-Subtle, an open source framework that allows the automatic creation of corpora constituted of sequential pairs of dialogue turns, gathered from subtitles. With the help of a configuration file, the B-Subtle framework permits to enrich subtitles and dialogue turns with extra information (such as movie genre or the polarity of an utterance); in addition, it allows different types of filtering to be applied to both subtitle files and dialogue turns. Therefore, with B-Subtle, each one can create his/her own corpus, tailored to his/her needs. Moreover, in order to replicate the process in a future experiment, the user just needs to save the configuration file. In this paper, we describe B-Subtle and demonstrate how to build different corpora with it.



中文翻译:

B-Subtle框架:根据您的需求定制字幕

电影和电视节目中的大量字幕可以免费在网络上以几乎每种语言免费找到。当前有几种由字幕构建的语料库,具有不同的注释和目的。考虑到新的字幕集不断发布,我们建议使用B-Subtle,这是一个开放源代码框架,它可以自动创建由字幕对中的连续对话对构成的语料库。在配置文件的帮助下,B-Subtle框架允许用额外的信息(例如电影体裁或发声的极性)丰富字幕和对白。此外,它允许将不同类型的过滤应用于字幕文件和对话轮。因此,使用B-Subtle,每个人都可以创建自己的语料库,以适合自己的需求。而且,为了在以后的实验中复制该过程,用户只需要保存配置文件即可。在本文中,我们描述了B-Subtle,并演示了如何用它来构建不同的语料库。

更新日期:2020-10-11
down
wechat
bug