当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep multimodal fusion for semantic image segmentation: A survey
Image and Vision Computing ( IF 4.7 ) Pub Date : 2020-10-07 , DOI: 10.1016/j.imavis.2020.104042
Yifei Zhang , Désiré Sidibé , Olivier Morel , Fabrice Mériaudeau

Recent advances in deep learning have shown excellent performance in various scene understanding tasks. However, in some complex environments or under challenging conditions, it is necessary to employ multiple modalities that provide complementary information on the same scene. A variety of studies have demonstrated that deep multimodal fusion for semantic image segmentation achieves significant performance improvement. These fusion approaches take the benefits of multiple information sources and generate an optimal joint prediction automatically. This paper describes the essential background concepts of deep multimodal fusion and the relevant applications in computer vision. In particular, we provide a systematic survey of multimodal fusion methodologies, multimodal segmentation datasets, and quantitative evaluations on the benchmark datasets. Existing fusion methods are summarized according to a common taxonomy: early fusion, late fusion, and hybrid fusion. Based on their performance, we analyze the strengths and weaknesses of different fusion strategies. Current challenges and design choices are discussed, aiming to provide the reader with a comprehensive and heuristic view of deep multimodal image segmentation.



中文翻译:

深度多峰融合用于语义图像分割:一项调查

深度学习的最新进展显示了在各种场景理解任务中的出色表现。但是,在某些复杂的环境中或在具有挑战性的条件下,有必要采用多种模式在同一场景下提供补充信息。各种研究表明,用于语义图像分割的深度多模式融合可显着提高性能。这些融合方法利用了多个信息源的优势,并自动生成了最佳的联合预测。本文介绍了深度多峰融合的基本背景概念及其在计算机视觉中的相关应用。特别是,我们提供了对多峰融合方法,多峰分割数据集以及基准数据集定量评估的系统调查。现有的融合方法根据常见分类法进行了总结:早期融合,晚期融合和混合融合。基于它们的性能,我们分析了不同融合策略的优缺点。讨论了当前的挑战和设计选择,旨在为读者提供深度多模态图像分割的全面启发式视图。

更新日期:2020-10-07
down
wechat
bug