Abstract
An accurate head-related transfer function can improve the subjective auditory localization performance of a particular subject. This paper proposes a deep neural network model for reconstructing the head-related transfer function (HRTF) based on anthropometric parameters and the orientation of the sound source. The proposed model consists of three subnetworks, including a one-dimensional convolutional neural network (1D-CNN) to process anthropometric parameters as input features and another network that takes the sound source position as input to serve as a marker. Finally, the outputs of these two networks are merged together as the input to a third network to estimate the HRTF. An objective method and a subjective method are proposed to evaluate the performance of the proposed method. For the objective evaluation, the root mean square error (RMSE) between the estimated HRTF and the measured HRTF is calculated. The results show that the proposed method performs better than a database matching method and a deep-neural-network-based method. In addition, the results of a sound localization test performed for the subjective evaluation show that the proposed method can localize sound sources with higher accuracy than the KEMAR dummy head HRTF or the DNN-based method. The objective and subjective results all show that the personalized HRTFs obtained using the proposed method perform well in HRTF reconstruction.
Similar content being viewed by others
References
Rumsey, F.: Spatial Audio. Focal Press, Woburn, MA, USA (2001)
Blauert, J.: Spatial Hearing, Revised edn. MIT, Cambridge, MA (1997)
Wenzel, E.M., Arruda, M., Kistler, D.J., et al.: Localization using non-individualized head-related transfer functions. J. Acoust. Soc. Am. 94, 111–123 (1994). https://doi.org/10.1121/1.407089
Algazi, V.R., Duda, R.O., Thompson, D., Avendano, C.: In the CIPIC HRTF database, workshop on applications of signal processing to audio and acoustics, pp. 99-102 (2001)
Zeng, X.Y., Wang, S.G., Gao, L.P.: A hybrid algorithm for selecting head-related transfer function based on similarity of anthropometric structures. J. Sound Vib. 329, 4093–4106 (2010)
Lu, D.D., Zeng, X.Y., Guo, X.C., et al.: Personalization of Head-Related Transfer Function Based on Sparse Principle Component Analysis and Sparse Representation of 3D Anthropometric Parameters. Acoust. Aust. (2019). https://doi.org/10.1007/s40857-019-00169-y
Torres-Gallegos, E.A., Orduña-Bustamante, F., Arámbula-Cosío, F.: Personalization of head related transfer function(HRTF) based on automatic photo-anthropometry and inference from a database. Appl. Acoust. 97, 84–95 (2015)
Katz, B.F.G.: Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. Acoust. Soc. Am. 110, 2440–2448 (2001)
Spagnol, S., Geronazzo, M., Avanzini, F.: On the relation between pinna reflection patterns and Head-Related Transfer Function Features. IEEE Trans. Audio, Speech Lang. Process. 21, 508–519 (2013). https://doi.org/10.1.1.706.9105
Spagnol, S., Avanzini, F.: Frequency estimation of the first pinna notch in Head-Related Transfer Functions with a linear anthropometric model. In: Proceeding 18th International Conference Digital Audio Effects (DAFx-2015), Trondheim, Norway, pp. 231-236 (December, 2015)
Shahnawaz, M., Bianchi, L., Sarti, A., Tubaro, S.: In Analyzing notch patterns of head related transfer functions in CIPIC and SYMARE databases, In: European Signal Processing Conference, pp. 101-105 (2016)
Bilinski, P., Ahrens, J., Thomas, M.R.P., Tashev, I., Platt, J.: In HRTF magnitude synthesis via sparse representation of anthropometric features, In: International Conference on Acoustics Speech and Signal Processing, pp. 4468-4472 (2014)
Hu, H., Zhou, L., Ma, H., Wu, Z.: HRTF personalization based on artificial neural network in individual virtual auditory space. Appl. Acoust. 69(2), 163–172 (2008)
Chun, C., Moon, J., Lee, J., et al.: Deep neural network based HRTF personalization using anthropometric measurements. In: Audio Engineering Society Convention 143, Audio Engineering Society (2017)
Lee, G.W., Kim, H.K.: Personalized HRTF modeling based on deep neural network using anthropometric measurements and images of the ear. Appl. Sci. 8(11), 2180 (2018). https://doi.org/10.3390/app8112180
Wu, Z., Song, S., Khosla, A., et al.: 3d shape-nets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920 (2015)
In: Guo, X., Xiong, D., Wang, Y., et al.: Head-Related Transfer Function Database of Chinese Male Pilots. Proceedings of the 16th International Conference on MMESE, Xi’an, China, pp. 3-11 (October 2016)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceeding of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, pp. 315-323 (April 2011)
Glorot, X., Bengio, Y.: In Understanding the difficulty of training deep feed-forward neural networks, In: International Conference on Artificial Intelligence and Statistics, pp. 249-256 (2010)
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Kingma, D., Ba, J.: ADAM: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, pp. 1-15 (May 2015)
Srivastava, N., Hinton, G., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Frank, J., Harrell, E.: Regression Modeling Strategies; Springer: Berlin/Heidelberg. Germany (2006). https://doi.org/10.1007/978-1-4757-3462-1
Simard, P.Y., Steinkraus, D.W., Platt, J.: In Best practices for convolutional neural networks applied to visual document analysis, In: International Conference on Document Analysis and Recognition, pp. 958-963 (2003). https://doi.org/10.1109/ICDAR.2003.1227801
Jin, C.T., Guillon, P., Epain, N., et al.: Creating the Sydney York morphological and acoustic recordings of ears database. IEEE Trans. Multimed. 16(1), 37–46 (2014)
Nishino, T., Inoue, N., Takeda, K., et al.: Estimation of HRTFs on the horizontal plane using physical features. Appl. Acoust. 68, 897–908 (2007)
Acknowledgements
Thanks for the support of the National Natural Science Foundation of China (11774291) and the Natural Science Foundation of Shaanxi Province of China (2018JM6020).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, D., Zeng, X., Guo, X. et al. Head-related Transfer Function Reconstruction with Anthropometric Parameters and the Direction of the Sound Source. Acoust Aust 49, 125–132 (2021). https://doi.org/10.1007/s40857-020-00209-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40857-020-00209-y