当前位置: X-MOL 学术Acoust. Aust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source
Acoustics Australia ( IF 1.7 ) Pub Date : 2020-11-05 , DOI: 10.1007/s40857-020-00209-y
Dongdong Lu , Xiangyang Zeng , Xiaochao Guo , Haitao Wang

An accurate head-related transfer function can improve the subjective auditory localization performance of a particular subject. This paper proposes a deep neural network model for reconstructing the head-related transfer function (HRTF) based on anthropometric parameters and the orientation of the sound source. The proposed model consists of three subnetworks, including a one-dimensional convolutional neural network (1D-CNN) to process anthropometric parameters as input features and another network that takes the sound source position as input to serve as a marker. Finally, the outputs of these two networks are merged together as the input to a third network to estimate the HRTF. An objective method and a subjective method are proposed to evaluate the performance of the proposed method. For the objective evaluation, the root mean square error (RMSE) between the estimated HRTF and the measured HRTF is calculated. The results show that the proposed method performs better than a database matching method and a deep-neural-network-based method. In addition, the results of a sound localization test performed for the subjective evaluation show that the proposed method can localize sound sources with higher accuracy than the KEMAR dummy head HRTF or the DNN-based method. The objective and subjective results all show that the personalized HRTFs obtained using the proposed method perform well in HRTF reconstruction.



中文翻译:

人体测量参数与声源方向的头部相关传递函数重构

准确的与头部相关的传递函数可以提高特定受试者的主观听觉定位性能。本文提出了一种深度神经网络模型,用于根据人体测量学参数和声源的方向重建与头部相关的传递函数(HRTF)。所提出的模型由三个子网组成,包括一个用于处理人体测量参数作为输入特征的一维卷积神经网络(1D-CNN),以及另一个将声源位置作为输入用作标记的网络。最后,将这两个网络的输出合并在一起,作为对第三个网络的输入,以估算HRTF。提出了一种客观方法和主观方法来评估所提出方法的性能。为了进行客观评估,计算估计的HRTF与测得的HRTF之间的均方根误差(RMSE)。结果表明,该方法的性能优于数据库匹配方法和基于深度神经网络的方法。此外,针对主观评估进行的声音定位测试的结果表明,与KEMAR虚拟头HRTF或基于DNN的方法相比,所提出的方法可以更精确地定位声源。客观和主观结果都表明,使用该方法获得的个性化HRTF在HRTF重建中表现良好。对主观评估进行的声音定位测试的结果表明,与KEMAR虚拟头HRTF或基于DNN的方法相比,所提出的方法可以更精确地定位声源。客观和主观结果均表明,使用该方法获得的个性化HRTF在HRTF重建中表现良好。对主观评估进行的声音定位测试的结果表明,与KEMAR虚拟头HRTF或基于DNN的方法相比,所提出的方法可以更精确地定位声源。客观和主观结果均表明,使用该方法获得的个性化HRTF在HRTF重建中表现良好。

更新日期:2020-11-06
down
wechat
bug