A real-time 3D video analyzer for enhanced 3D audio–visual systems

Jeong, Sangoh; Kim, Hyun-Soo; Kim, KyuWoon; Jeon, Byeong-Moon; Won, Joong-Ho

doi:10.1007/s00530-019-00631-x

A real-time 3D video analyzer for enhanced 3D audio–visual systems

Regular Paper
Published: 07 August 2019

Volume 26, pages 125–137, (2020)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Sangoh Jeong¹,
Hyun-Soo Kim²,
KyuWoon Kim²,
Byeong-Moon Jeon² &
…
Joong-Ho Won³

286 Accesses
1 Citation
Explore all metrics

Abstract

With the recent advent of three-dimensional (3D) sound home theater systems (HTS), more and more TV viewers are experiencing rich, immersive auditory presence at home. In this paper, visual processing approaches are provided to make 3D audio–visual (AV) systems more realistic to the viewers. In the proposed system, a visual engine processes stereo video streams to extract a disparity map for each pair of left and right video frames. Then, the engine determines the video depth level representing each frame of the disparity map. An audio engine then gives 3D sound depth effects according to the estimated video depth, thereby making viewers’ audio–visual experiences more synchronized. Two video processing algorithms are devised to extract the video depth from each frame: one is based on object segmentation, which turns out to be too complex to be implemented in the field-programmable gate array (FPGA) employed for real-time processing; the other algorithm uses a much simpler histogram-based approach to determine the depth of each video frame, and hence, it is more suitable for FPGA implementations. Subjective listening test results support the effectiveness of the proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformation of Video Signal Processing Techniques from 2D to 3D: A Survey

Introduction

Structural similarity-based rate control algorithm for 3D video

Article 24 April 2021

References

André, C.: Audiovisual spatial congruence, and applications to 3d sound and stereoscopic video. Ph.D. thesis, Université de Liège (2013)
Bradski, G., Kaehler, A.: Learning OpenCV. O’Reilly (2008)
Choueiri, E.Y.: Spectrally uncolored optimal crosstalk cancellation for audio through loudspeakers. US Patent 9,167,344 (2015)
Coleman, P., Franck, A., Francombe, J., Liu, Q., de Campos, T., Hughes, R.J., Menzies, D., Gálvez, M.F.S., Tang, Y., Woodcock, J., et al.: An audio-visual system for object-based audio: from recording to listening. IEEE Trans. Multimed. 20(8), 1919–1931 (2018)
Article Google Scholar
D’Auria, D., Di Mauro, D., Calandra, D.M., Cutugno, F.: A 3d audio augmented reality system for a cultural heritage management and fruition. J. Digit. Inf. Manag. 13(4) (2015)
Jin, S., Cho, J., Pham, X.D., Lee, K., Park, S., Kim, M., Jeon, J.: FPGA design and implementation of a real-time stereo vision system. IEEE Trans. Circ. Syst. Video Technol. 20(1), 15–25 (2010)
Article Google Scholar
Koo, H.S., Jeong, C.S.: Fast stereo matching using block similarity. In: Computational science and its applications (CSIA), pp. 789–798. Assisi, Italy (2004)
Matsumura, T., Iwanaga, N., Kobayashi, W., Onoye, T., Shirakawa, I.: Embedded 3D sound movement system based on feature extraction of head-related transfer function. IEEE Trans. Consumer Electron. 51(1), 262–267 (2005)
Article Google Scholar
Nakayama, Y.: Distance control of sound image using line array loudspeaker for three-dimensional audio visual system. MODSIM 2005, Int. Congr. Model. Simul. pp. 3064–3070 (2005)
Ogale, A.S., Aloimonos, Y.: Shape and the stereo correspondence problem. Int. J. Comput. Vis. 65(3), 147–162 (2005)
Article Google Scholar
Ogale, A.S., Aloimonos, Y.: A roadmap to the integration of early visual modules. Int. J. Comput. Vis. 72(1), 9–25 (2007)
Article Google Scholar
Owens, A., Efros, A.A.: Audio-visual scene analysis with self-supervised multisensory features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 631–648 (2018)
Parvizi, E., Wu, Q.M.J.: Multiple object tracking based on adaptive depth segmentation. In: Canadian Conference on Computer and Robot Vision, vol. 5, pp. 273–277. Washington (2008)
Parvizi, E., Wu, Q.M.J.: Real-time approach for adaptive object segmentation in time-of-flight sensors. In: IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 236–240. Dayton, Ohio (2008)
Savitzky, A., Golay, M.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964)
Article Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Article Google Scholar
Simon-Galvez, M., Fazi, F.M.: Loudspeaker arrays for transaural reproduction. In: Proceedings of the 22nd International Congress on Sound and Vibration. Florence, Italy (2015)
Simon-Galvez, M., Fazi, F.M.: Room compensation for binaural reproduction with loudspeaker arrays. In: Procedings of European Acoustics Association (Euroregio). Porto, Portugal (2016)
Tornow, M., Kaszubiak, J., Kuhn, R.W., Michaelis, B., Schindler, T.: Hardware Approach for Real Time Machine Stereo Vision. In: 9th World Multi-Conference on Systemics. Cybernetics and Informatics, vol. 5, pp. 111–116. Orlando, FL (2005)
Wang, Y., Ostermann, J., Zhang, Y.: Video Processing and Communications. Prentice Hall, Upper Saddle River (2001)
Google Scholar
Yoo, K., Koo, S., Chang, S., Kim, W., Lee, H.: Apparatus for displaying image and method for operating the same. ROK Patent No. 1020110052306 (2011)

Download references

Acknowledgements

Joong-Ho Won’s research was supported by the National Research Foundation of Korea (NRF) Grant Funded by the Korean government (MSIT) (No. 2019R1A2C1007126).

Author information

Authors and Affiliations

Data Science Lab., Korea Electric Power Corporation, Seoul, South Korea
Sangoh Jeong
Corporate R&D Center, LG Electronics, Seoul, South Korea
Hyun-Soo Kim, KyuWoon Kim & Byeong-Moon Jeon
Department of Statistics, Seoul National University, Seoul, South Korea
Joong-Ho Won

Authors

Sangoh Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Hyun-Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar
KyuWoon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Byeong-Moon Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Joong-Ho Won
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joong-Ho Won.

Additional information

Communicated by P. Pala.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeong, S., Kim, HS., Kim, K. et al. A real-time 3D video analyzer for enhanced 3D audio–visual systems. Multimedia Systems 26, 125–137 (2020). https://doi.org/10.1007/s00530-019-00631-x

Download citation

Received: 29 March 2016
Accepted: 27 July 2019
Published: 07 August 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00530-019-00631-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A real-time 3D video analyzer for enhanced 3D audio–visual systems

Abstract

Access this article

Similar content being viewed by others

Transformation of Video Signal Processing Techniques from 2D to 3D: A Survey

Introduction

Structural similarity-based rate control algorithm for 3D video

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A real-time 3D video analyzer for enhanced 3D audio–visual systems

Abstract

Access this article

Similar content being viewed by others

Transformation of Video Signal Processing Techniques from 2D to 3D: A Survey

Introduction

Structural similarity-based rate control algorithm for 3D video

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation