Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
A novel deep-learning-based method for semantic segmentation of RGB and thermal images is introduced. The proposed method employs a novel neural network design for multi-modal fusion based on multi-resolution patch processing. A novel decoder module is introduced to fuse the RGB and thermal features extracted by separate encoder streams. Experimental results on synthetic and real-world data demonstrate the efficiency of the proposed method compared with state-of-the-art methods.