UVid-Net: Enhanced Semantic Segmentation of UAV Aerial Videos by Embedding Temporal Information,IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

当前位置： X-MOL 学术 › IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

UVid-Net: Enhanced Semantic Segmentation of UAV Aerial Videos by Embedding Temporal Information
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( IF 4.7 ) Pub Date : 2021-03-31 , DOI: 10.1109/jstars.2021.3069909
S. Girisha , Ujjwal Verma , M. M. Manohara Pai , Radhika M. Pai

Semantic segmentation of aerial videos has been extensively used for decision making in monitoring environmental changes, urban planning, and disaster management. The reliability of these decision support systems is dependent on the accuracy of the video semantic segmentation algorithms. The existing CNN-based video semantic segmentation methods have enhanced the image semantic segmentation methods by incorporating an additional module such as LSTM or optical flow for computing temporal dynamics of the video which is a computational overhead. The proposed research work modifies the CNN architecture by incorporating temporal information to improve the efficiency of video semantic segmentation. In this work, an enhanced encoder-decoder based CNN architecture (UVid-Net) is proposed for unmanned aerial vehicle (UAV) video semantic segmentation. The encoder of the proposed architecture embeds temporal information for temporally consistent labeling. The decoder is enhanced by introducing the feature-refiner module, which aids in accurate localization of the class labels. The proposed UVid-Net architecture for UAV video semantic segmentation is quantitatively evaluated on extended ManipalUAVid dataset. The performance metric mean Intersection over Union of 0.79 has been observed which is significantly greater than the other state-of-the-art algorithms. Further, the proposed work produced promising results even for the pretrained model of UVid-Net on urban street scene by fine tuning the final layer on UAV aerial videos.

中文翻译：

UVid-Net：通过嵌入时态信息增强无人机航拍视频的语义分割

航拍视频的语义分割已广泛用于监测环境变化、城市规划和灾害管理等决策。这些决策支持系统的可靠性取决于视频语义分割算法的准确性。现有的基于 CNN 的视频语义分割方法通过结合额外的模块（例如 LSTM 或光流）来计算视频的时间动态，从而增强了图像语义分割方法，这是一种计算开销。所提出的研究工作通过结合时间信息来修改 CNN 架构，以提高视频语义分割的效率。在这项工作中，提出了一种基于增强型编码器-解码器的 CNN 架构（UVid-Net），用于无人机（UAV）视频语义分割。所提出的架构的编码器嵌入时间信息以实现时间一致的标记。通过引入特征细化器模块来增强解码器，这有助于准确定位类标签。所提出的用于无人机视频语义分割的 UVid-Net 架构在扩展 ManipalUAVid 数据集上进行了定量评估。观察到的性能指标平均交集为 0.79，这明显高于其他最先进的算法。此外，通过微调无人机航拍视频的最后一层，所提出的工作甚至对于城市街道场景上的 UVid-Net 预训练模型也产生了有希望的结果。

更新日期：2021-03-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11