当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective
arXiv - CS - Sound Pub Date : 2020-12-31 , DOI: arxiv-2012.15391
Shen Chen, Mingwei Zhang, Jiamin Cui, Wei Yao

Deep learning (DL) has brought about remarkable breakthrough in processing images, video and speech due to its efficacy in extracting highly abstract representation and learning very complex functions. However, there is seldom operating procedure reported on how to make it for real use cases. In this paper, we intend to address this problem by presenting a generalized operating procedure for DL from the perspective of unconstrained optimal design, which is motivated by a simple intension to remove the barrier of using DL, especially for those scientists or engineers who are new but eager to use it. Our proposed procedure contains seven steps, which are project/problem statement, data collection, architecture design, initialization of parameters, defining loss function, computing optimal parameters, and inference, respectively. Following this procedure, we build a multi-stream end-to-end speaker verification system, in which the input speech utterance is processed by multiple parallel streams within different frequency range, so that the acoustic modeling can be more robust resulting from the diversity of features. Trained with VoxCeleb dataset, our experimental results verify the effectiveness of our proposed operating procedure, and also show that our multi-stream framework outperforms single-stream baseline with 20 % relative reduction in minimum decision cost function (minDCF).

中文翻译:

深度学习的通用操作程序:无约束的最佳设计视角

深度学习(DL)由于其在提取高度抽象的表示形式和学习非常复杂的功能方面的功效,在处理图像,视频和语音方面取得了显着突破。但是,很少有关于如何在实际使用案例中进行报告的操作程序。在本文中,我们打算通过从无约束的最佳设计的角度介绍DL的通用操作程序来解决此问题,其动机是通过简单的意图来消除使用DL的障碍,尤其是对于那些新手科学家或工程师而言但渴望使用它。我们提出的过程包含七个步骤,分别是项目/问题陈述,数据收集,体系结构设计,参数初始化,定义损失函数,计算最佳参数和推断。按照此过程,我们构建了一个多流端到端说话者验证系统,其中输入语音话语由不同频率范围内的多个并行流处理,从而使声学建模可以更加稳健,因为特征。在VoxCeleb数据集的训练下,我们的实验结果验证了我们提出的操作程序的有效性,并且还表明,在最小决策成本函数(minDCF)相对降低20%的情况下,我们的多流框架优于单流基线。
更新日期:2021-01-01
down
wechat
bug