当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3039765
Kai Zhen , Mi Suk Lee , Jongmo Sung , Seungkwon Beack , Minje Kim

Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III codec at 112 kbps.

中文翻译:

用于高效端到端神经音频编码的损失函数的心理声学校准

传统的音频编码技术通常利用人类对声音的感知或心理声学来降低比特率,同时保持解码音频信号的感知质量。然而,对于神经音频编解码器,损失函数的客观性质通常会导致次优的音质以及由于模型尺寸大而导致的高运行时间复杂性。在这项工作中,我们提出了一种心理声学校准方案来重新定义神经音频编码系统的损失函数,以便它可以解码与参考更相似的信号,但模型复杂度要低得多。建议的损失函数结合了全局掩蔽阈值,允许对应于听不见的伪影的重建误差。实验结果表明,所提出的模型比基线神经编解码器的性能好两倍,每秒多消耗 23.4% 的比特。使用所提出的方法,一个只有 90 万个参数的轻量级神经编解码器执行接近透明的音频编码,与商业 MPEG-1 音频第三层编解码器的 112 kbps 相当。
更新日期:2020-01-01
down
wechat
bug