Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2019-07-19 , DOI: 10.1186/s13636-019-0156-x
Javier Tejedor , Doroteo T. Toledano , Paula Lopez-Otero , Laura Docio-Fernandez , Mikel Peñagarikano , Luis Javier Rodriguez-Fuentes , Antonio Moreno-Sandoval

The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given a spoken query. Research on this area is continuously fostered with the organization of QbE STD evaluations. This paper presents a multi-domain internationally open evaluation for QbE STD in Spanish. The evaluation aims at retrieving the speech files that contain the queries, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops; RTVE database, which includes broadcast television (TV) shows; and COREMAH database, which contains 2-people spontaneous speech conversations about different topics. The evaluation has been designed carefully so that several analyses of the main results can be carried out. We present the evaluation itself, the three databases, the evaluation metrics, the systems submitted to the evaluation, the results, and the detailed post-evaluation analyses based on some query properties (within-vocabulary/out-of-vocabulary queries, single-word/multi-word queries, and native/foreign queries). Fusion results of the primary systems submitted to the evaluation are also presented. Three different teams took part in the evaluation, and ten different systems were submitted. The results suggest that the QbE STD task is still in progress, and the performance of these systems is highly sensitive to changes in the data domain. Nevertheless, QbE STD strategies are able to outperform text-based STD in unseen data domains.

中文翻译：

从口语查询中搜索语音：多域国际 ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

音频和视频存储库中存储的大量信息使语音搜索 (SoS) 成为当今的优先领域。在 SoS 中，逐个查询口语术语检测 (QbE STD) 旨在从给定口语查询的语音存储库中检索数据。随着 QbE STD 评估的组织，该领域的研究不断得到促进。本文介绍了针对西班牙语 QbE STD 的多领域国际开放评估。评估旨在检索包含查询的语音文件，提供它们的开始和结束时间，以及反映给予检测的置信度的分数。评估中使用了三个不同领域的西班牙语语音数据库： MAVIR 数据库，其中包含来自研讨会的一组演讲；RTVE 数据库，包括广播电视（TV）节目；和 COREMAH 数据库，其中包含关于不同主题的 2 人自发语音对话。评估经过精心设计，因此可以对主要结果进行多项分析。我们展示了评估本身、三个数据库、评估指标、提交评估的系统、结果以及基于某些查询属性的详细评估后分析（词汇表内/词汇表外查询、单词/多词查询，以及本地/外国查询）。还介绍了提交评估的主要系统的融合结果。三个不同的团队参加了评估，提交了十个不同的系统。结果表明 QbE STD 任务仍在进行中，这些系统的性能对数据域的变化高度敏感。

更新日期：2019-07-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文