An RGB‐D object detection model with high‐generalization ability applied to tea harvesting robot for outdoor cross‐variety tea shoots detection,Journal of Field Robotics

当前位置： X-MOL 学术 › J. Field Robot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An RGB‐D object detection model with high‐generalization ability applied to tea harvesting robot for outdoor cross‐variety tea shoots detection
Journal of Field Robotics ( IF 8.3 ) Pub Date : 2024-03-22 , DOI: 10.1002/rob.22318
Yanxu Wu ₁ , Jianneng Chen _{1,

2} , Leiying He _{1,

2} , Jiangsheng Gui ₁ , Jiangming Jia _{1,

2}

Affiliation

Detecting tea shoots is the first and most crucial step in achieving intelligent tea harvesting. However, when faced with thousands of tea varieties, establishing a high‐quality and comprehensive database comes with significant costs. Therefore, it has become an urgent challenge to improve the model's generalization ability and train it with minimal samples to develop a model capable of achieving optimal detection performance in various environments and tea varieties. This paper introduces a model named You Only See Tea (YOST) which utilizes depth maps to enhance model's generalization ability. It is applied to detect tea shoots in complex environments and to perform cross‐variety tea shoots detection. Our approach differs from common data augmentation strategies aimed at enhancing model generalization by diversifying the data set. Instead, we enhance the model's learning capability by strategically amplifying its attention towards core target features while simultaneously reducing attention towards noncore features. The proposed module YOST is developed upon the You Only Look Once version 7 (YOLOv7) model, utilizing two shared‐weight backbone networks to process both RGB and depth images. Then further integrate two modalities with feature layers at the same scale into our designed Ultra‐attention Fusion and Activation Module. By utilizing this approach, the model can proficiently detect targets by capturing core features, even when encountering complex environments or unfamiliar tea leaf varieties. The experimental results indicate that YOST displayed faster and more consistent convergence compared with YOLOv7 in training. Additionally, YOST demonstrated a 6.58% enhancement in AP50 for detecting tea shoots in complex environments. Moreover, when faced with a cross‐variety tea shoots detection task involving multiple unfamiliar varieties, YOST showcased impressive generalization abilities, achieving a significant maximum AP50 improvement of 33.31% compared with YOLOv7. These findings establish its superior performance. Our research departs from the heavy reliance on high‐generalization models on a large number of training samples, making it easier to train small‐scale, high‐generalization models. This approach significantly alleviates the pressure associated with data collection and model training.

中文翻译：

一种具有高泛化能力的 RGB-D 目标检测模型应用于茶叶采摘机器人，用于室外跨品种茶芽检测

检测茶芽是实现茶叶智能采摘的第一步，也是最关键的一步。然而，面对数千种茶叶品种，建立高质量、全面的数据库需要巨大的成本。因此，提高模型的泛化能力，用最少的样本进行训练，开发出能够在各种环境和茶叶品种下实现最佳检测性能的模型，成为迫切的挑战。本文介绍了一种名为 You Only See Tea (YOST) 的模型，它利用深度图来增强模型的泛化能力。用于复杂环境下的茶芽检测以及跨品种茶芽检测。我们的方法不同于旨在通过数据集多样化来增强模型泛化的常见数据增强策略。相反，我们通过战略性地增强模型对核心目标特征的注意力，同时减少对非核心特征的注意力来增强模型的学习能力。所提出的模块 YOST 是在 You Only Look Once version 7 (YOLOv7) 模型的基础上开发的，利用两个共享权重主干网络来处理 RGB 和深度图像。然后进一步将具有相同规模特征层的两种模式集成到我们设计的超注意力融合和激活模块中。通过利用这种方法，即使遇到复杂的环境或不熟悉的茶叶品种，模型也可以通过捕获核心特征来熟练地检测目标。实验结果表明，与 YOLOv7 相比，YOST 在训练中表现出更快、更一致的收敛性。此外，YOST 在以下方面表现出 6.58% 的增强：美联社50 用于检测复杂环境中的茶芽。此外，当面对涉及多个不熟悉品种的跨品种茶芽检测任务时，YOST 表现出了令人印象深刻的泛化能力，实现了显着的最大值美联社50 相比 YOLOv7 提升了 33.31%。这些发现证实了其卓越的性能。我们的研究脱离了对大量训练样本的高泛化模型的严重依赖，使得训练小规模、高泛化模型变得更容易。这种方法显着减轻了与数据收集和模型训练相关的压力。

更新日期：2024-03-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>