The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
arXiv - CS - Sound Pub Date : 2020-07-01 , DOI: arxiv-2007.00225
Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning. Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy. We simultaneously solve the main caption generation and sub indeterminacy problems by estimating keywords and sentence length through multi-task learning. We tested a simplified model of our submission using the development-testing dataset. Our model achieved 20.7 SPIDEr score where that of the baseline system was 5.4.

中文翻译：

NTT DCASE2020 挑战任务 6 系统：带有关键字和句子长度估计的自动音频字幕

本技术报告描述了参与声学场景和事件检测和分类 (DCASE) 2020 挑战赛任务 6：自动音频字幕的系统。我们提交的重点是解决自动音频字幕中的两个不确定性问题：词选择不确定性和句子长度不确定性。我们通过多任务学习估计关键字和句子长度，同时解决主字幕生成和子不确定性问题。我们使用开发测试数据集测试了我们提交的简化模型。我们的模型获得了 20.7 的 SPIDER 分数，而基线系统的分数为 5.4。

更新日期：2020-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>