当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding
arXiv - CS - Sound Pub Date : 2020-12-31 , DOI: arxiv-2101.00054
Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III codec at 112 kbps.

中文翻译:

损失函数的心理声学校准,可实现高效的端到端神经音频编码

常规的音频编码技术通常利用人类对声音或心理声学的感知来降低比特率,同时保持解码后的音频信号的感知质量。但是,对于神经音频编解码器,损失函数的客观性质通常会导致模型质量过大,从而导致音质欠佳以及运行时复杂性高。在这项工作中,我们提出了一种心理声学校准方案,以重新定义神经音频编码系统的损失函数,以便它可以在感觉上与参考相似地解码信号,但模型复杂度要低得多。拟议的损失函数结合了全局掩蔽阈值,从而允许与听不见的伪影相对应的重建误差。实验结果表明,该模型的性能优于基线神经编解码器两倍,并且每秒消耗23.4%的更多位。利用所提出的方法,仅90万个参数的轻量级神经编解码器就可以以112 kbps的速度执行与商业MPEG-1音频层III编解码器相当的近乎透明的音频编码。
更新日期:2021-01-05
down
wechat
bug