当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EfficientNet-Absolute Zero for Continuous Speech Keyword Spotting
arXiv - CS - Sound Pub Date : 2020-12-31 , DOI: arxiv-2012.15695
Amir Mohammad Rostami, Ali Karimi, Mohammad Ali Akhaee

Keyword spotting is a process of finding some specific words or phrases in recorded speeches by computers. Deep neural network algorithms, as a powerful engine, can handle this problem if they are trained over an appropriate dataset. To this end, the football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 samples in 18 classes. The continuous speech synthesis method proposed to made FKD usable in the practical application which works with continuous speeches. Besides, we proposed a lightweight architecture called EfficientNet-A0 (absolute zero) by applying the compound scaling method on EfficientNet-B0 for keyword spotting task. Finally, the proposed architecture is evaluated with various models. It is realized that EfficientNet-A0 and Resnet models outperform other models on this dataset.

中文翻译:

连续语音关键词识别的有效网络绝对零

关键字发现是通过计算机在录制的语音中查找某些特定单词或短语的过程。如果深度神经网络算法经过适当的数据集训练,则可以作为一个强大的引擎来解决此问题。为此,通过众包收集了足球关键字数据集(FKD),作为波斯语中的新关键字定位数据集。该数据集包含18类的近31000个样本。提出了连续语音合成方法,使FKD可以在与连续语音一起工作的实际应用中使用。此外,我们通过在EfficientNet-B0上应用复合缩放方法来实现关键字发现任务,提出了一种称为EfficientNet-A0(绝对零)的轻量级体系结构。最后,使用各种模型对提出的体系结构进行了评估。
更新日期:2021-01-01
down
wechat
bug