当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A real-time 3D video analyzer for enhanced 3D audio–visual systems
Multimedia Systems ( IF 3.9 ) Pub Date : 2019-08-07 , DOI: 10.1007/s00530-019-00631-x
Sangoh Jeong , Hyun-Soo Kim , KyuWoon Kim , Byeong-Moon Jeon , Joong-Ho Won

With the recent advent of three-dimensional (3D) sound home theater systems (HTS), more and more TV viewers are experiencing rich, immersive auditory presence at home. In this paper, visual processing approaches are provided to make 3D audio–visual (AV) systems more realistic to the viewers. In the proposed system, a visual engine processes stereo video streams to extract a disparity map for each pair of left and right video frames. Then, the engine determines the video depth level representing each frame of the disparity map. An audio engine then gives 3D sound depth effects according to the estimated video depth, thereby making viewers’ audio–visual experiences more synchronized. Two video processing algorithms are devised to extract the video depth from each frame: one is based on object segmentation, which turns out to be too complex to be implemented in the field-programmable gate array (FPGA) employed for real-time processing; the other algorithm uses a much simpler histogram-based approach to determine the depth of each video frame, and hence, it is more suitable for FPGA implementations. Subjective listening test results support the effectiveness of the proposed approaches.

中文翻译:

用于增强型 3D 视听系统的实时 3D 视频分析仪

随着最近 3D (3D) 音响家庭影院系统 (HTS) 的出现,越来越多的电视观众在家中体验到丰富、身临其境的听觉体验。在本文中,提供了视觉处理方法,使 3D 视听 (AV) 系统对观众来说更加逼真。在所提出的系统中,视觉引擎处理立体视频流以提取每对左右视频帧的视差图。然后,引擎确定代表视差图的每一帧的视频深度级别。然后,音频引擎根据估计的视频深度提供 3D 声音深度效果,从而使观众的视听体验更加同步。设计了两种视频处理算法来从每一帧中提取视频深度:一种基于对象分割,事实证明,这太复杂了,无法在用于实时处理的现场可编程门阵列 (FPGA) 中实现;另一种算法使用更简单的基于直方图的方法来确定每个视频帧的深度,因此更适合 FPGA 实现。主观听力测试结果支持所提出方法的有效性。
更新日期:2019-08-07
down
wechat
bug