当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Time-Domain Multi-modal Bone/air Conducted Speech Enhancement
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3000968
Cheng Yu , Kuo-Hsuan Hung , Syu-Siang Wang , Yu Tsao , Jeih-weih Hung

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while manifesting speech-phoneme structures, and thus complements its air-conducted counterpart. In this study, we propose a novel multi-modal SE structure in the time domain that leverages bone- and air-conducted signals. In addition, we examine two ensemble-learning-based strategies, early fusion (EF) and late fusion (LF), to integrate the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results on the Mandarin corpus indicate that this newly presented multi-modal (integrating bone- and air-conducted signals) SE structure significantly outperforms the single-source SE counterparts (with a bone- or air-conducted signal only) in various speech evaluation metrics. In addition, the adoption of an LF strategy other than an EF in this novel SE multi-modal structure achieves better results.

中文翻译:

时域多模态骨/气传导语音增强

先前的研究已经证明,集成视频信号作为一种补充方式,可以促进语音增强 (SE) 的性能提高。然而,视频剪辑通常包含大量数据并且在计算资源方面造成高成本,因此可能使 SE 系统复杂化。作为替代来源,骨传导语音信号具有中等数据大小,同时表现出语音-音素结构,因此补充了其空气传导对应物。在这项研究中,我们提出了一种新的时域多模态 SE 结构,它利用了骨传导和空气传导信号。此外,我们研究了两种基于集成学习的策略,早期融合(EF)和后期融合(LF),以整合两种类型的语音信号,并采用基于深度学习的全卷积网络进行增强。在普通话语料库上的实验结果表明,这种新提出的多模态(整合骨传导和空气传导信号)SE 结构在各种语音中显着优于单源 SE 对应物(仅具有骨传导或空气传导信号)评价指标。此外,在这种新颖的 SE 多模态结构中采用 EF 以外的 LF 策略可以获得更好的结果。
更新日期:2020-01-01
down
wechat
bug