Semi-supervised deep learning and low-cost cameras for the semantic segmentation of natural images in viticulture,Precision Agriculture

当前位置： X-MOL 学术 › Precision Agric. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semi-supervised deep learning and low-cost cameras for the semantic segmentation of natural images in viticulture
Precision Agriculture ( IF 6.2 ) Pub Date : 2022-06-21 , DOI: 10.1007/s11119-022-09929-9
A. Casado-García , J. Heras , A. Milella , R. Marani

Automatic yield monitoring and in-field robotic harvesting by low-cost cameras require object detection and segmentation solutions to tackle the poor quality of natural images and the lack of exactly-labeled datasets of consistent sizes. This work proposed the application of deep learning for semantic segmentation of natural images acquired by a low-cost RGB-D camera in a commercial vineyard. Several deep architectures were trained and compared on 85 labeled images. Three semi-supervised learning methods (PseudoLabeling, Distillation and Model Distillation) were proposed to take advantage of 320 non-annotated images. In these experiments, the DeepLabV3+ architecture with a ResNext50 backbone, trained with the set of labeled images, achieved the best overall accuracy of 84.78%. In contrast, the Manet architecture combined with the EfficientnetB3 backbone reached the highest accuracy for the bunch class (85.69%). The application of semi-supervised learning methods boosted the segmentation accuracy between 5.62 and 6.01%, on average. Further discussions are presented to show the effects of a fine-grained manual image annotation on the accuracy of the proposed methods and to compare time requirements.

中文翻译：

用于葡萄栽培中自然图像语义分割的半监督深度学习和低成本相机

低成本相机的自动产量监控和现场机器人收割需要对象检测和分割解决方案来解决自然图像质量差和缺乏大小一致的精确标记数据集的问题。这项工作提出了将深度学习应用于商业葡萄园中低成本 RGB-D 相机获取的自然图像的语义分割。在 85 个标记图像上训练和比较了几个深度架构。提出了三种半监督学习方法（PseudoLabeling、Distillation 和 Model Distillation）来利用 320 个未标注的图像。在这些实验中，带有 ResNext50 主干的 DeepLabV3+ 架构，使用一组标记图像进行训练，达到了 84.78% 的最佳整体准确率。相比之下，Manet 架构与 EfficientnetB3 主干相结合，达到了束类的最高准确度（85.69%）。半监督学习方法的应用将分割准确率平均提高了 5.62% 到 6.01%。提出了进一步的讨论，以显示细粒度手动图像注释对所提出方法的准确性的影响，并比较时间要求。

更新日期：2022-06-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>