Skip to main content
Log in

Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source

Deep Learning-Based Head-Related Transfer Function Personalization

  • Original Paper
  • Published:
Acoustics Australia Aims and scope Submit manuscript

Abstract

An accurate head-related transfer function can improve the subjective auditory localization performance of a particular subject. This paper proposes a deep neural network model for reconstructing the head-related transfer function (HRTF) based on anthropometric parameters and the orientation of the sound source. The proposed model consists of three subnetworks, including a one-dimensional convolutional neural network (1D-CNN) to process anthropometric parameters as input features and another network that takes the sound source position as input to serve as a marker. Finally, the outputs of these two networks are merged together as the input to a third network to estimate the HRTF. An objective method and a subjective method are proposed to evaluate the performance of the proposed method. For the objective evaluation, the root mean square error (RMSE) between the estimated HRTF and the measured HRTF is calculated. The results show that the proposed method performs better than a database matching method and a deep-neural-network-based method. In addition, the results of a sound localization test performed for the subjective evaluation show that the proposed method can localize sound sources with higher accuracy than the KEMAR dummy head HRTF or the DNN-based method. The objective and subjective results all show that the personalized HRTFs obtained using the proposed method perform well in HRTF reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Rumsey, F.: Spatial Audio. Focal Press, Woburn, MA, USA (2001)

    Google Scholar 

  2. Blauert, J.: Spatial Hearing, Revised edn. MIT, Cambridge, MA (1997)

    Google Scholar 

  3. Wenzel, E.M., Arruda, M., Kistler, D.J., et al.: Localization using non-individualized head-related transfer functions. J. Acoust. Soc. Am. 94, 111–123 (1994). https://doi.org/10.1121/1.407089

    Article  Google Scholar 

  4. Algazi, V.R., Duda, R.O., Thompson, D., Avendano, C.: In the CIPIC HRTF database, workshop on applications of signal processing to audio and acoustics, pp. 99-102 (2001)

  5. Zeng, X.Y., Wang, S.G., Gao, L.P.: A hybrid algorithm for selecting head-related transfer function based on similarity of anthropometric structures. J. Sound Vib. 329, 4093–4106 (2010)

    Article  Google Scholar 

  6. Lu, D.D., Zeng, X.Y., Guo, X.C., et al.: Personalization of Head-Related Transfer Function Based on Sparse Principle Component Analysis and Sparse Representation of 3D Anthropometric Parameters. Acoust. Aust. (2019). https://doi.org/10.1007/s40857-019-00169-y

  7. Torres-Gallegos, E.A., Orduña-Bustamante, F., Arámbula-Cosío, F.: Personalization of head related transfer function(HRTF) based on automatic photo-anthropometry and inference from a database. Appl. Acoust. 97, 84–95 (2015)

    Article  Google Scholar 

  8. Katz, B.F.G.: Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. Acoust. Soc. Am. 110, 2440–2448 (2001)

    Article  Google Scholar 

  9. Spagnol, S., Geronazzo, M., Avanzini, F.: On the relation between pinna reflection patterns and Head-Related Transfer Function Features. IEEE Trans. Audio, Speech Lang. Process. 21, 508–519 (2013). https://doi.org/10.1.1.706.9105

  10. Spagnol, S., Avanzini, F.: Frequency estimation of the first pinna notch in Head-Related Transfer Functions with a linear anthropometric model. In: Proceeding 18th International Conference Digital Audio Effects (DAFx-2015), Trondheim, Norway, pp. 231-236 (December, 2015)

  11. Shahnawaz, M., Bianchi, L., Sarti, A., Tubaro, S.: In Analyzing notch patterns of head related transfer functions in CIPIC and SYMARE databases, In: European Signal Processing Conference, pp. 101-105 (2016)

  12. Bilinski, P., Ahrens, J., Thomas, M.R.P., Tashev, I., Platt, J.: In HRTF magnitude synthesis via sparse representation of anthropometric features, In: International Conference on Acoustics Speech and Signal Processing, pp. 4468-4472 (2014)

  13. Hu, H., Zhou, L., Ma, H., Wu, Z.: HRTF personalization based on artificial neural network in individual virtual auditory space. Appl. Acoust. 69(2), 163–172 (2008)

    Article  Google Scholar 

  14. Chun, C., Moon, J., Lee, J., et al.: Deep neural network based HRTF personalization using anthropometric measurements. In: Audio Engineering Society Convention 143, Audio Engineering Society (2017)

  15. Lee, G.W., Kim, H.K.: Personalized HRTF modeling based on deep neural network using anthropometric measurements and images of the ear. Appl. Sci. 8(11), 2180 (2018). https://doi.org/10.3390/app8112180

    Article  Google Scholar 

  16. Wu, Z., Song, S., Khosla, A., et al.: 3d shape-nets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920 (2015)

  17. In: Guo, X., Xiong, D., Wang, Y., et al.: Head-Related Transfer Function Database of Chinese Male Pilots. Proceedings of the 16th International Conference on MMESE, Xi’an, China, pp. 3-11 (October 2016)

  18. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceeding of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, pp. 315-323 (April 2011)

  19. Glorot, X., Bengio, Y.: In Understanding the difficulty of training deep feed-forward neural networks, In: International Conference on Artificial Intelligence and Statistics, pp. 249-256 (2010)

  20. Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  21. Kingma, D., Ba, J.: ADAM: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, pp. 1-15 (May 2015)

  22. Srivastava, N., Hinton, G., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  23. Frank, J., Harrell, E.: Regression Modeling Strategies; Springer: Berlin/Heidelberg. Germany (2006). https://doi.org/10.1007/978-1-4757-3462-1

  24. Simard, P.Y., Steinkraus, D.W., Platt, J.: In Best practices for convolutional neural networks applied to visual document analysis, In: International Conference on Document Analysis and Recognition, pp. 958-963 (2003). https://doi.org/10.1109/ICDAR.2003.1227801

  25. Jin, C.T., Guillon, P., Epain, N., et al.: Creating the Sydney York morphological and acoustic recordings of ears database. IEEE Trans. Multimed. 16(1), 37–46 (2014)

    Article  Google Scholar 

  26. Nishino, T., Inoue, N., Takeda, K., et al.: Estimation of HRTFs on the horizontal plane using physical features. Appl. Acoust. 68, 897–908 (2007)

    Article  Google Scholar 

Download references

Acknowledgements

Thanks for the support of the National Natural Science Foundation of China (11774291) and the Natural Science Foundation of Shaanxi Province of China (2018JM6020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangyang Zeng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, D., Zeng, X., Guo, X. et al. Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source. Acoust Aust 49, 125–132 (2021). https://doi.org/10.1007/s40857-020-00209-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40857-020-00209-y

Keywords

Navigation