Skip to main content
Log in

A real-time 3D video analyzer for enhanced 3D audio–visual systems

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

With the recent advent of three-dimensional (3D) sound home theater systems (HTS), more and more TV viewers are experiencing rich, immersive auditory presence at home. In this paper, visual processing approaches are provided to make 3D audio–visual (AV) systems more realistic to the viewers. In the proposed system, a visual engine processes stereo video streams to extract a disparity map for each pair of left and right video frames. Then, the engine determines the video depth level representing each frame of the disparity map. An audio engine then gives 3D sound depth effects according to the estimated video depth, thereby making viewers’ audio–visual experiences more synchronized. Two video processing algorithms are devised to extract the video depth from each frame: one is based on object segmentation, which turns out to be too complex to be implemented in the field-programmable gate array (FPGA) employed for real-time processing; the other algorithm uses a much simpler histogram-based approach to determine the depth of each video frame, and hence, it is more suitable for FPGA implementations. Subjective listening test results support the effectiveness of the proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. André, C.: Audiovisual spatial congruence, and applications to 3d sound and stereoscopic video. Ph.D. thesis, Université de Liège (2013)

  2. Bradski, G., Kaehler, A.: Learning OpenCV. O’Reilly (2008)

  3. Choueiri, E.Y.: Spectrally uncolored optimal crosstalk cancellation for audio through loudspeakers. US Patent 9,167,344 (2015)

  4. Coleman, P., Franck, A., Francombe, J., Liu, Q., de Campos, T., Hughes, R.J., Menzies, D., Gálvez, M.F.S., Tang, Y., Woodcock, J., et al.: An audio-visual system for object-based audio: from recording to listening. IEEE Trans. Multimed. 20(8), 1919–1931 (2018)

    Article  Google Scholar 

  5. D’Auria, D., Di Mauro, D., Calandra, D.M., Cutugno, F.: A 3d audio augmented reality system for a cultural heritage management and fruition. J. Digit. Inf. Manag. 13(4) (2015)

  6. Jin, S., Cho, J., Pham, X.D., Lee, K., Park, S., Kim, M., Jeon, J.: FPGA design and implementation of a real-time stereo vision system. IEEE Trans. Circ. Syst. Video Technol. 20(1), 15–25 (2010)

    Article  Google Scholar 

  7. Koo, H.S., Jeong, C.S.: Fast stereo matching using block similarity. In: Computational science and its applications (CSIA), pp. 789–798. Assisi, Italy (2004)

  8. Matsumura, T., Iwanaga, N., Kobayashi, W., Onoye, T., Shirakawa, I.: Embedded 3D sound movement system based on feature extraction of head-related transfer function. IEEE Trans. Consumer Electron. 51(1), 262–267 (2005)

    Article  Google Scholar 

  9. Nakayama, Y.: Distance control of sound image using line array loudspeaker for three-dimensional audio visual system. MODSIM 2005, Int. Congr. Model. Simul. pp. 3064–3070 (2005)

  10. Ogale, A.S., Aloimonos, Y.: Shape and the stereo correspondence problem. Int. J. Comput. Vis. 65(3), 147–162 (2005)

    Article  Google Scholar 

  11. Ogale, A.S., Aloimonos, Y.: A roadmap to the integration of early visual modules. Int. J. Comput. Vis. 72(1), 9–25 (2007)

    Article  Google Scholar 

  12. Owens, A., Efros, A.A.: Audio-visual scene analysis with self-supervised multisensory features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 631–648 (2018)

  13. Parvizi, E., Wu, Q.M.J.: Multiple object tracking based on adaptive depth segmentation. In: Canadian Conference on Computer and Robot Vision, vol. 5, pp. 273–277. Washington (2008)

  14. Parvizi, E., Wu, Q.M.J.: Real-time approach for adaptive object segmentation in time-of-flight sensors. In: IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 236–240. Dayton, Ohio (2008)

  15. Savitzky, A., Golay, M.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964)

    Article  Google Scholar 

  16. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)

    Article  Google Scholar 

  17. Simon-Galvez, M., Fazi, F.M.: Loudspeaker arrays for transaural reproduction. In: Proceedings of the 22nd International Congress on Sound and Vibration. Florence, Italy (2015)

  18. Simon-Galvez, M., Fazi, F.M.: Room compensation for binaural reproduction with loudspeaker arrays. In: Procedings of European Acoustics Association (Euroregio). Porto, Portugal (2016)

  19. Tornow, M., Kaszubiak, J., Kuhn, R.W., Michaelis, B., Schindler, T.: Hardware Approach for Real Time Machine Stereo Vision. In: 9th World Multi-Conference on Systemics. Cybernetics and Informatics, vol. 5, pp. 111–116. Orlando, FL (2005)

  20. Wang, Y., Ostermann, J., Zhang, Y.: Video Processing and Communications. Prentice Hall, Upper Saddle River (2001)

    Google Scholar 

  21. Yoo, K., Koo, S., Chang, S., Kim, W., Lee, H.: Apparatus for displaying image and method for operating the same. ROK Patent No. 1020110052306 (2011)

Download references

Acknowledgements

Joong-Ho Won’s research was supported by the National Research Foundation of Korea (NRF) Grant Funded by the Korean government (MSIT) (No. 2019R1A2C1007126).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joong-Ho Won.

Additional information

Communicated by P. Pala.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeong, S., Kim, HS., Kim, K. et al. A real-time 3D video analyzer for enhanced 3D audio–visual systems. Multimedia Systems 26, 125–137 (2020). https://doi.org/10.1007/s00530-019-00631-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-019-00631-x

Keywords

Navigation