Alias-and-Separate: Wideband Speech Coding Using Sub-Nyquist Sampling and Speech Separation,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Alias-and-Separate: Wideband Speech Coding Using Sub-Nyquist Sampling and Speech Separation
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2022-09-16 , DOI: 10.1109/lsp.2022.3207381
Soojoong Hwang ₁ , Eunkyun Lee ₁ , Inseon Jang ₂ , Jong Won Shin ₁

Affiliation

Decimation of a discrete-time signal below the Nyquist rate without applying an appropriate lowpass filter results in a distortion called aliasing. If wideband speech sampled at 16 kHz is decimated by 2 to result in a signal sampled at 8 kHz with aliasing, the decimated signal would be the summation of two speech-like signals, which are the narrowband speech covering 0-4 kHz and the spectrally flipped aliasing component coming from 8-4 kHz. Recently, the performance of speech separation has been remarkably improved with deep learning-based approaches, implying that the narrowband and aliasing components may be able to be separated. In this letter, we propose a novel method for low-rate wideband speech coding utilizing a standard narrowband codec. Instead of coding wideband speech using a wideband codec with a limited bitrate, we propose to decimate the input wideband speech incurring aliasing, and then encode it with a narrowband codec by allocating all the allowed bitrate to 0-4 kHz. After decoding the encoded bitstream, we apply a speech separation technique to obtain the narrowband and aliasing signals, which are then used to reconstruct the wideband speech by expansion, low/highpass filtering, and summation. Experimental results showed that the proposed method could achieve subjective quality comparable to the speeches coded by wideband codecs at higher bitrates in a subjective MUSHRA test.

中文翻译：

Alias-and-Separate：使用亚奈奎斯特采样和语音分离的宽带语音编码

在不应用适当低通滤波器的情况下抽取低于奈奎斯特速率的离散时间信号会导致失真，称为混叠。如果在 16 kHz 采样的宽带语音被抽取 2 以产生在 8 kHz 采样的信号混叠，则抽取的信号将是两个类语音信号的总和，即覆盖 0-4 kHz 的窄带语音和频谱来自 8-4 kHz 的翻转混叠分量。最近，基于深度学习的方法显着提高了语音分离的性能，这意味着窄带和混叠分量可能能够被分离。在这封信中，我们提出了一种利用标准窄带编解码器进行低速率宽带语音编码的新方法。代替使用具有有限比特率的宽带编解码器对宽带语音进行编码，我们建议抽取引起混叠的输入宽带语音，然后通过将所有允许的比特率分配给 0-4 kHz 来使用窄带编解码器对其进行编码。在对编码的比特流进行解码后，我们应用语音分离技术来获得窄带和混叠信号，然后通过扩展、低/高通滤波和求和来重构宽带语音。实验结果表明，在主观 MUSHRA 测试中，所提出的方法可以实现与宽带编解码器在较高比特率下编码的语音相当的主观质量。然后通过扩展、低/高通滤波和求和来重建宽带语音。实验结果表明，在主观 MUSHRA 测试中，所提出的方法可以实现与宽带编解码器在较高比特率下编码的语音相当的主观质量。然后通过扩展、低/高通滤波和求和来重建宽带语音。实验结果表明，在主观 MUSHRA 测试中，所提出的方法可以实现与宽带编解码器在较高比特率下编码的语音相当的主观质量。

更新日期：2022-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>