当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
arXiv - CS - Sound Pub Date : 2021-01-07 , DOI: arxiv-2101.02550 Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi
arXiv - CS - Sound Pub Date : 2021-01-07 , DOI: arxiv-2101.02550 Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi
Multi-task learning (MTL) and the attention technique have been proven to
effectively extract robust acoustic features for various speech-related
applications in noisy environments. In this study, we integrated MTL and the
attention-weighting mechanism and propose an attention-based MTL (ATM0 approach
to realize a multi-model learning structure and to promote the speech
enhancement (SE) and speaker identification (SI) systems simultaneously. There
are three subsystems in the proposed ATM: SE, SI, and attention-Net (AttNet).
In the proposed system, a long-short-term memory (LSTM) is used to perform SE,
while a deep neural network (DNN) model is applied to construct SI and AttNet
in ATM. The overall ATM system first extracts the representative features and
then enhances the speech spectra in LSTM-SE and classifies speaker identity in
DNN-SI. We conducted our experiment on Taiwan Mandarin hearing in noise test
database. The evaluation results indicate that the proposed ATM system not only
increases the quality and intelligibility of noisy speech input but also
improves the accuracy of the SI system when compared to the conventional MTL
approaches.
更新日期:2021-01-08