Original papers
Towards practical 2D grapevine bud detection with fully convolutional networks

https://doi.org/10.1016/j.compag.2020.105947Get rights and content

Highlights

  • Grapevine bud detection with Fully convolutional deep nets pre-trained with mobile net.

  • Evaluation over main detection aspects: segmentation, correspondence identification, and localization.

  • Substantial improvements over best known competitor on all three aspects.

  • Validation of practical performance as a measurement component of bud number, bud área, and internode length.

Abstract

In Viticulture, visual inspection of the plant is a necessary task for measuring relevant variables. In many cases, these visual inspections are susceptible to automation through computer vision methods. Bud detection is one such visual task, central for the measurement of important variables such as: measurement of bud sunlight exposure, autonomous pruning, bud counting, type-of-bud classification, bud geometric characterization, internode length, bud area, and bud development stage, among others. This paper presents a computer method for grapevine bud detection based on a Fully Convolutional Networks MobileNet architecture (FCN-MN). To validate its performance, this architecture was compared in the detection task with a strong method for bud detection, Scanning Windows (SW) based on a patch classifier, showing improvements over three aspects of detection: segmentation, correspondence identification and localization. The best version of FCN-MN showed a detection F1-measure of 88.6% (for true positives defined as detected components whose intersection-over-union with the true bud is above 0.5), and false positives that are small and near the true bud. Splits –false positives overlapping the true bud– showed a mean segmentation precision of 89.3%(21.7), while false alarms –false positives not overlapping the true bud– showed a mean pixel area of only 8% the area of a true bud, and a distance (between mass centers) of 1.1 true bud diameters. The paper concludes by discussing how these results for FCN-MN would produce sufficiently accurate measurements of bud variables such as bud number, bud area, and internode length, suggesting a good performance in a practical setup.

Introduction

For decades, viticulturists have been producing models of the most relevant plant processes for determining fruit quality and yield, soil profiling, or vine health, and have been gathering a wealth of information to feed into these models. Better and more efficient measuring procedures have resulted in more information, with its corresponding impact on the quality of model outcomes. Such information corresponds to a long list of variables for assessing the state of different parts of the plant, as the one found in the manual published by The Australian Wine Research Institute, 2020a, The Australian Wine Research Institute, 2020b. Most of these variables of interest, however, are still being measured with manual instruments and visual inspection. This results in high labor costs that limit measurement campaigns to only small data samples which, even with the use of statistical inference or spatial interpolation techniques (Whelan et al., 1996), restrict the quality of the decisions that agronomists can conduct from them.

Precision viticulture in general (Bramley, 2009), and computer vision algorithms in particular, have been growing in the last couple of decades mostly due to their potential for mitigating these limitations (Seng et al., 2018, Matese and Di Gennaro, 2015). These algorithms come along with the promise of an unprecedented boost in the production of vineyard information as well as many expectations not only about possible improvements in the quality of the measurements, but in its potential to produce better models by feeding all this information to big data algorithms.

The present work contributes to this general endeavor with FCN-MN 1 (Long et al., 2015, Shelhamer et al., 2017), an algorithm for measuring variables related to one specific plant part: the bud, an organ of major importance as it is the growing point of the fruits, containing all the plant’s productive potential (May, 2000). The present contribution of autonomous bud detection not only enables the autonomous measurement of bud-related variables currently measured by agronomists (see Table 1 for a non-exhaustive list of bud-related variables), but it also has the potential to enable the measurement of novel, yet important, variables that at present cannot be measured manually. One example is the total sunlight captured by buds, which depends on the unfeasible manual task of determining the exact location of buds in 3D space. Although the present work focuses on 2D detection, it could be easily upgraded to 3D by, for instance, integrating 2D detection into the workflow proposed by Díaz et al. (2018).

Table 1 shows a non-exhaustive list of the main bud-related variables currently measured by vineyard managers (Sánchez and Dokoozlian, 2005, Noyce et al., 2016, Collins et al., 2020), together with an assessment of the extent to which detection contributes to their measurement. The right-most column (other required operations) indicates the information beyond detection, necessary to complete the measurement, while the middle columns labeled (i), (ii), and (iii) indicate the specific aspects of detection required for that variable: (i) whether it requires a good segmentation, i.e., the discrimination of which pixels in the scene correspond to buds and which correspond to non-bud; (ii) a good correspondence identification, i.e., discrimination of bud pixels as belonging to different buds; or (iii) a good localization, i.e., the localization of the bud within the scene. For instance, regarding the bud number variable, for it to coincide with the detection count, different components detected for the same bud must be bundled together as a single detection. For the type-of-bud classification, in addition to correctly identifying components with buds, the segmentation of the part of the image corresponding to the bud must minimize the noise produced by background pixels. Lastly, to measure the incidence of sunlight on the bud, localization rather than segmentation is necessary, plus the leaf 3D surface geometry.

A good detector, therefore, should be evaluated on all three aspects of segmentation, correspondence identification and localization. This is easy for our detector as its implementation first produces a segmentation mask, which is then post-processed to produce correspondence identification and localization. The specific aspects of this approach are detailed in Section 2. The analysis of detection results presented in Section 3 shows that this approach is superior to state-of-the-art algorithms for grapevine bud detection. Finally, Section 4 discusses the scope, limitations of the results obtained for bud detection, sufficiency of the performance achieved for the measurement of a selection of variables in Table 3, as well as the most important conclusions, future work and potential improvements.

A wide variety of research using computer vision and machine learning algorithms to acquire information about vineyards (Seng et al., 2018) can be found in the literature, such as berry and bunch detection (Nuske et al., 2011), fruit size and weight estimation (Tardaguila et al., 2012), leaf area indices and yield estimation (Diago et al., 2012), plant phenotyping (Herzog et al., 2014a, Herzog et al., 2014b), autonomous selective spraying (Berenstein et al., 2010), and more (Tardáguila et al., 2012, Whalley and Shanmuganathan, 2013). Among the outstanding computer algorithms in recent years, artificial neural networks have aroused great interest in the industry as a means to carry out various visual recognition tasks (Hirano et al., 2006, Kahng et al., 2017, Tilgner et al., 2019). In particular, Convolutional Neural Networks (CNN) have become the dominant machine learning approach to visual object recognition (Ning et al., 2017). Two recent studies have successfully applied visual recognition techniques based on deep learning networks to identify viticultural variables to estimate production in vineyards. One of them, Grimm et al. (2019), uses an FCN to carry out segmentation of grapevine plant organs such as young shoots, pedicels, flowers or grapes. The other, Rudolph et al. (2018), uses images of grapevines under field conditions that are segmented using a CNN to detect inflorescences as regions of interest, and over these regions, the circle Hough Transform algorithm is applied to detect flowers.

Several works aim at detecting and locating buds in different types of crops by means of autonomous visual recognition systems. For instance, Tarry et al. (2014) presents an integrated system for chrysanthemum bud detection that can be used to automate labour intensive tasks in floriculture greenhouses. More recently, Zhao et al. (2018) presented a computer vision system used to identify the internodes and buds of stalk crops. To the best of our knowledge and research efforts, there are at least four works that specifically address the problem of bud detection in the grapevine by using autonomous visual recognition systems. The research work by Xu et al., 2014, Herzog et al., 2014b and Pérez et al. (2017) apply different techniques to perform 2D image detection involving different computer and machine learning algorithms. In addition, Díaz et al. (2018) introduces a workflow to localize buds in 3D space. The most relevant details of each are presented below.

Xu et al. (2014)’s study presents a bud detection algorithm using indoor captured RGB images and controlled lighting and background conditions specifically to establish a groundwork for an autonomous pruning system in winter. The authors apply a threshold filter to discriminate the background of the plant skeleton, resulting in a binary image. They assume that the shape of buds resembles corners and apply the Harris corner detector algorithm over the binary image to detect them. This process obtains a recall of 0.702, i.e., 70.2% of the buds were detected.

Herzog et al. (2014b)’s work presents three methods for the detection of buds in very advanced stages of development when the buds have already burst and the first leaves are emerging. All methods are semi-automatic and require human intervention to validate the quality of the results. The best result is obtained using an RGB image with an artificial black background and corresponds to a recall of 94%. The authors argue that this recall is enough to solve the problem of phenotyping vines. They also argue that these good results can be explained by the particular green color and the morphology of the already sprouting buds of approximately 2 cm.

Pérez et al. (2017) outlines an approach for the classification of bud images in winter, using SVM as a classifier and Bag of Features to compute visual descriptors. They report a recall of over 90% and an accuracy of 86% when sorting images containing at least 60% of a bud and a ratio of 20–80% of bud vs. non-bud pixels. They argue that this classifier can be used in algorithms for 2D localization of the sliding windows type due to its robustness to variation in window size and position. It is precisely this idea that has been reproduced in the present work to implement the baseline competitor to our approach.

Finally, Díaz et al. (2018) introduces a workflow for the localization of buds in 3D space. The workflow consists of five steps. The first one reconstructs a 3D point cloud corresponding to the grapevine structure from several RGB images. The second step applies a 2D detection method using the sliding window and patch classification technique of Pérez et al. (2017). The next step uses a voting scheme to classify each point in the cloud as a bud or non-bud. The fourth step applies the DBSCAN clustering algorithm to group points in the cloud that correspond to a bud. Finally, in the fifth step, the localization is performed, obtaining the center of mass coordinates of each 3D point cluster. They report a recall of 45% and a precision of 100% and a localization error of approximately 1.5 cm, or 3 bud diameters.

Although these research studies represent a great advance in relation to the problem of detecting and localizing buds, they still show at least one of the following limitations: (i) use of artificial background outdoors; (ii) controlled lighting indoors; (iii) need for user interaction; (iv) bud detection in very advanced stages of development; (v) low bud detection/classification recall, and (vi) although some of these works perform some kind of segmentation process as part of the approach, none of them aim to segment the bud or report metrics of the quality of the segmentation performed. These limitations represent a major barrier to the effective development of tools for measuring bud-related variables.

Section snippets

Fully Convolutional Network with MobileNet (FCN-MN)

As outlined in the introduction, the approach proposes the use of computer vision algorithms to: (i) segment buds by classifying which pixels in the scene correspond to buds and which correspond to background (non-buds), (ii) identify bud correspondences by discriminating those pixels that belong to different buds in the observed scene, and (iii) localize each bud in the scene.

For the segmentation operation, i.e., pixel classification, the fully convolutional network introduced in Long et al.

Experimental results

In this section we present a systematic evaluation of the quality of our proposed FCN-MN procedure for bud detection over all three aspects of detection required for the measurement of the relevant bud-related variables listed in Table 1: segmentation, correspondence identification, and localization. First, in the following subsection, we present metrics that quantify the quality of these aspects, followed by Section 3 that presents the results for the metric values obtained for different

Discussion and conclusions

This section discusses the results obtained by the proposed approach in the context of the problem of grapevine bud detection and its impact as a tool for measuring viticultural variables of interest. The discussion is complemented with some highlights of the most important conclusions together with some potential lines of future work.

This work introduces FCN-MN, a Fully Convolutional Network with MobileNet architecture for the detection of grapevine buds in 2D images captured in natural field

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was funded by the Argentinean Universidad Tecnológica Nacional (UTN), the National Council of Scientific and Technical Research (CONICET), and the National Fund for Scientific and Technological Promotion (FONCyT).

References (46)

  • D. Han

    Comparison of commonly used image interpolation methods

  • R. Hartley et al.

    Multiple view geometry in computer vision

    (2003)
  • Herzog, K., Kicherer, A., Töpfer, R., 2014a. Objective phenotyping the time of bud burst by analyzing grapevine field...
  • K. Herzog

    Initial steps for high-throughput phenotyping in vineyards

    Aust. New Zealand Grapegrower Winemaker

    (2014)
  • Y. Hirano et al.

    Industry and object recognition: Applications, applied research and challenges

  • Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H., 2017. Mobilenets:...
  • P. Jaccard

    The distribution of the flora in the alpine zone. 1

    New Phytologist

    (1912)
  • M. Kahng et al.

    A cti v is: Visual exploration of industry-scale deep neural network models

    IEEE Trans. Visualization Comput. Graphics

    (2017)
  • Ç. Kaymak et al.

    A brief survey and an application of semantic image segmentation for autonomous driving

  • S. Kornblith et al.

    Do better imagenet models transfer better?

  • C.H. Lampert et al.

    Beyond sliding windows: Object localization by efficient subwindow search

  • J. Long et al.

    Fully convolutional networks for semantic segmentation

  • A. Matese et al.

    Technology in precision viticulture: A state of the art review

    Int. J. Wine Res.

    (2015)
  • Cited by (0)

    View full text