Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source

Lu, Dongdong; Zeng, Xiangyang; Guo, Xiaochao; Wang, Haitao

doi:10.1007/s40857-020-00209-y

Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source

Deep Learning-Based Head-Related Transfer Function Personalization

Original Paper
Published: 05 November 2020

Volume 49, pages 125–132, (2021)
Cite this article

Acoustics Australia Aims and scope Submit manuscript

Dongdong Lu¹,
Xiangyang Zeng¹,
Xiaochao Guo² &
…
Haitao Wang¹

372 Accesses
3 Citations
Explore all metrics

Abstract

An accurate head-related transfer function can improve the subjective auditory localization performance of a particular subject. This paper proposes a deep neural network model for reconstructing the head-related transfer function (HRTF) based on anthropometric parameters and the orientation of the sound source. The proposed model consists of three subnetworks, including a one-dimensional convolutional neural network (1D-CNN) to process anthropometric parameters as input features and another network that takes the sound source position as input to serve as a marker. Finally, the outputs of these two networks are merged together as the input to a third network to estimate the HRTF. An objective method and a subjective method are proposed to evaluate the performance of the proposed method. For the objective evaluation, the root mean square error (RMSE) between the estimated HRTF and the measured HRTF is calculated. The results show that the proposed method performs better than a database matching method and a deep-neural-network-based method. In addition, the results of a sound localization test performed for the subjective evaluation show that the proposed method can localize sound sources with higher accuracy than the KEMAR dummy head HRTF or the DNN-based method. The objective and subjective results all show that the personalized HRTFs obtained using the proposed method perform well in HRTF reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multiple model high-resolution head-related impulse response database for aided and unaided ears

Article Open access 13 February 2019

Joachim Thiemann & Steven van de Par

Personalized HRIR Based on PointNet Network Using Anthropometric Parameters

A Splicing Interpolation Method for Head-Related Transfer Function

References

Rumsey, F.: Spatial Audio. Focal Press, Woburn, MA, USA (2001)
Google Scholar
Blauert, J.: Spatial Hearing, Revised edn. MIT, Cambridge, MA (1997)
Google Scholar
Wenzel, E.M., Arruda, M., Kistler, D.J., et al.: Localization using non-individualized head-related transfer functions. J. Acoust. Soc. Am. 94, 111–123 (1994). https://doi.org/10.1121/1.407089
Article Google Scholar
Algazi, V.R., Duda, R.O., Thompson, D., Avendano, C.: In the CIPIC HRTF database, workshop on applications of signal processing to audio and acoustics, pp. 99-102 (2001)
Zeng, X.Y., Wang, S.G., Gao, L.P.: A hybrid algorithm for selecting head-related transfer function based on similarity of anthropometric structures. J. Sound Vib. 329, 4093–4106 (2010)
Article Google Scholar
Lu, D.D., Zeng, X.Y., Guo, X.C., et al.: Personalization of Head-Related Transfer Function Based on Sparse Principle Component Analysis and Sparse Representation of 3D Anthropometric Parameters. Acoust. Aust. (2019). https://doi.org/10.1007/s40857-019-00169-y
Torres-Gallegos, E.A., Orduña-Bustamante, F., Arámbula-Cosío, F.: Personalization of head related transfer function(HRTF) based on automatic photo-anthropometry and inference from a database. Appl. Acoust. 97, 84–95 (2015)
Article Google Scholar
Katz, B.F.G.: Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. Acoust. Soc. Am. 110, 2440–2448 (2001)
Article Google Scholar
Spagnol, S., Geronazzo, M., Avanzini, F.: On the relation between pinna reflection patterns and Head-Related Transfer Function Features. IEEE Trans. Audio, Speech Lang. Process. 21, 508–519 (2013). https://doi.org/10.1.1.706.9105
Spagnol, S., Avanzini, F.: Frequency estimation of the first pinna notch in Head-Related Transfer Functions with a linear anthropometric model. In: Proceeding 18th International Conference Digital Audio Effects (DAFx-2015), Trondheim, Norway, pp. 231-236 (December, 2015)
Shahnawaz, M., Bianchi, L., Sarti, A., Tubaro, S.: In Analyzing notch patterns of head related transfer functions in CIPIC and SYMARE databases, In: European Signal Processing Conference, pp. 101-105 (2016)
Bilinski, P., Ahrens, J., Thomas, M.R.P., Tashev, I., Platt, J.: In HRTF magnitude synthesis via sparse representation of anthropometric features, In: International Conference on Acoustics Speech and Signal Processing, pp. 4468-4472 (2014)
Hu, H., Zhou, L., Ma, H., Wu, Z.: HRTF personalization based on artificial neural network in individual virtual auditory space. Appl. Acoust. 69(2), 163–172 (2008)
Article Google Scholar
Chun, C., Moon, J., Lee, J., et al.: Deep neural network based HRTF personalization using anthropometric measurements. In: Audio Engineering Society Convention 143, Audio Engineering Society (2017)
Lee, G.W., Kim, H.K.: Personalized HRTF modeling based on deep neural network using anthropometric measurements and images of the ear. Appl. Sci. 8(11), 2180 (2018). https://doi.org/10.3390/app8112180
Article Google Scholar
Wu, Z., Song, S., Khosla, A., et al.: 3d shape-nets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920 (2015)
In: Guo, X., Xiong, D., Wang, Y., et al.: Head-Related Transfer Function Database of Chinese Male Pilots. Proceedings of the 16th International Conference on MMESE, Xi’an, China, pp. 3-11 (October 2016)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceeding of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, pp. 315-323 (April 2011)
Glorot, X., Bengio, Y.: In Understanding the difficulty of training deep feed-forward neural networks, In: International Conference on Artificial Intelligence and Statistics, pp. 249-256 (2010)
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Kingma, D., Ba, J.: ADAM: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, pp. 1-15 (May 2015)
Srivastava, N., Hinton, G., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Frank, J., Harrell, E.: Regression Modeling Strategies; Springer: Berlin/Heidelberg. Germany (2006). https://doi.org/10.1007/978-1-4757-3462-1
Simard, P.Y., Steinkraus, D.W., Platt, J.: In Best practices for convolutional neural networks applied to visual document analysis, In: International Conference on Document Analysis and Recognition, pp. 958-963 (2003). https://doi.org/10.1109/ICDAR.2003.1227801
Jin, C.T., Guillon, P., Epain, N., et al.: Creating the Sydney York morphological and acoustic recordings of ears database. IEEE Trans. Multimed. 16(1), 37–46 (2014)
Article Google Scholar
Nishino, T., Inoue, N., Takeda, K., et al.: Estimation of HRTFs on the horizontal plane using physical features. Appl. Acoust. 68, 897–908 (2007)
Article Google Scholar

Download references

Acknowledgements

Thanks for the support of the National Natural Science Foundation of China (11774291) and the Natural Science Foundation of Shaanxi Province of China (2018JM6020).

Author information

Authors and Affiliations

Northwest Polytechnical University, No. 127 Youyixi Road, Beilin District, Xi’an, 710072, Shaanxi, People’s Republic of China
Dongdong Lu, Xiangyang Zeng & Haitao Wang
Air Force Medical Center of FMMU, Beijing, 100142, People’s Republic of China
Xiaochao Guo

Authors

Dongdong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangyang Zeng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, D., Zeng, X., Guo, X. et al. Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source. Acoust Aust 49, 125–132 (2021). https://doi.org/10.1007/s40857-020-00209-y

Download citation

Received: 09 June 2020
Accepted: 20 October 2020
Published: 05 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s40857-020-00209-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source

Abstract

Access this article

Similar content being viewed by others

A multiple model high-resolution head-related impulse response database for aided and unaided ears

Personalized HRIR Based on PointNet Network Using Anthropometric Parameters

A Splicing Interpolation Method for Head-Related Transfer Function

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source

Abstract

Access this article

Similar content being viewed by others

A multiple model high-resolution head-related impulse response database for aided and unaided ears

Personalized HRIR Based on PointNet Network Using Anthropometric Parameters

A Splicing Interpolation Method for Head-Related Transfer Function

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation