当前位置: X-MOL 学术IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
UVid-Net: Enhanced Semantic Segmentation of UAV Aerial Videos by Embedding Temporal Information
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( IF 5.5 ) Pub Date : 2021-03-31 , DOI: 10.1109/jstars.2021.3069909
S. Girisha , Ujjwal Verma , M. M. Manohara Pai , Radhika M. Pai

Semantic segmentation of aerial videos has been extensively used for decision making in monitoring environmental changes, urban planning, and disaster management. The reliability of these decision support systems is dependent on the accuracy of the video semantic segmentation algorithms. The existing CNN-based video semantic segmentation methods have enhanced the image semantic segmentation methods by incorporating an additional module such as LSTM or optical flow for computing temporal dynamics of the video which is a computational overhead. The proposed research work modifies the CNN architecture by incorporating temporal information to improve the efficiency of video semantic segmentation. In this work, an enhanced encoder–decoder based CNN architecture (UVid-Net) is proposed for unmanned aerial vehicle (UAV) video semantic segmentation. The encoder of the proposed architecture embeds temporal information for temporally consistent labeling. The decoder is enhanced by introducing the feature-refiner module, which aids in accurate localization of the class labels. The proposed UVid-Net architecture for UAV video semantic segmentation is quantitatively evaluated on extended ManipalUAVid dataset. The performance metric mean Intersection over Union of 0.79 has been observed which is significantly greater than the other state-of-the-art algorithms. Further, the proposed work produced promising results even for the pretrained model of UVid-Net on urban street scene by fine tuning the final layer on UAV aerial videos.

中文翻译:

UVid-Net:通过嵌入时间信息来增强无人机航拍视频的语义分割

航拍视频的语义分割已广泛用于监控环境变化,城市规划和灾难管理的决策。这些决策支持系统的可靠性取决于视频语义分割算法的准确性。现有的基于CNN的视频语义分割方法通过合并其他模块(例如LSTM或光流)来计算视频的时间动态,从而增强了图像语义分割方法,这是计算开销。拟议的研究工作通过合并时间信息来改进CNN体系结构,以提高视频语义分割的效率。在这项工作中,提出了一种基于增强的编解码器的CNN架构(UVid-Net),用于无人机(UAV)视频语义分割。所提出的体系结构的编码器嵌入时间信息以用于时间上一致的标记。通过引入特征优化模块来增强解码器,该模块有助于对类标签进行准确定位。在扩展的ManipalUAVid数据集上定量评估了建议的用于无人机视频语义分割的UVid-Net体系结构。已观察到性能度量的“联合上的平均交集”为0.79,大大大于其他最新算法。此外,通过微调无人机航拍视频的最后一层,即使对于城市街道场景上的UVid-Net预训练模型,拟议的工作也产生了可喜的成果。有助于准确定位类标签。在扩展的ManipalUAVid数据集上定量评估了建议的用于无人机视频语义分割的UVid-Net体系结构。已经观察到性能度量的联合上的平均交集为0.79,这明显大于其他最新算法。此外,通过微调无人机航拍视频的最后一层,即使对于城市街道场景上的UVid-Net预训练模型,拟议的工作也产生了可喜的成果。有助于准确定位类标签。在扩展的ManipalUAVid数据集上定量评估了拟议的用于无人机视频语义分割的UVid-Net体系结构。已观察到性能度量的“联合上的平均交集”为0.79,大大大于其他最新算法。此外,通过微调无人机航拍视频的最后一层,即使对于城市街道场景上的UVid-Net预训练模型,拟议的工作也产生了可喜的成果。
更新日期:2021-04-30
down
wechat
bug