Original papersTowards practical 2D grapevine bud detection with fully convolutional networks
Introduction
For decades, viticulturists have been producing models of the most relevant plant processes for determining fruit quality and yield, soil profiling, or vine health, and have been gathering a wealth of information to feed into these models. Better and more efficient measuring procedures have resulted in more information, with its corresponding impact on the quality of model outcomes. Such information corresponds to a long list of variables for assessing the state of different parts of the plant, as the one found in the manual published by The Australian Wine Research Institute, 2020a, The Australian Wine Research Institute, 2020b. Most of these variables of interest, however, are still being measured with manual instruments and visual inspection. This results in high labor costs that limit measurement campaigns to only small data samples which, even with the use of statistical inference or spatial interpolation techniques (Whelan et al., 1996), restrict the quality of the decisions that agronomists can conduct from them.
Precision viticulture in general (Bramley, 2009), and computer vision algorithms in particular, have been growing in the last couple of decades mostly due to their potential for mitigating these limitations (Seng et al., 2018, Matese and Di Gennaro, 2015). These algorithms come along with the promise of an unprecedented boost in the production of vineyard information as well as many expectations not only about possible improvements in the quality of the measurements, but in its potential to produce better models by feeding all this information to big data algorithms.
The present work contributes to this general endeavor with FCN-MN 1 (Long et al., 2015, Shelhamer et al., 2017), an algorithm for measuring variables related to one specific plant part: the bud, an organ of major importance as it is the growing point of the fruits, containing all the plant’s productive potential (May, 2000). The present contribution of autonomous bud detection not only enables the autonomous measurement of bud-related variables currently measured by agronomists (see Table 1 for a non-exhaustive list of bud-related variables), but it also has the potential to enable the measurement of novel, yet important, variables that at present cannot be measured manually. One example is the total sunlight captured by buds, which depends on the unfeasible manual task of determining the exact location of buds in 3D space. Although the present work focuses on 2D detection, it could be easily upgraded to 3D by, for instance, integrating 2D detection into the workflow proposed by Díaz et al. (2018).
Table 1 shows a non-exhaustive list of the main bud-related variables currently measured by vineyard managers (Sánchez and Dokoozlian, 2005, Noyce et al., 2016, Collins et al., 2020), together with an assessment of the extent to which detection contributes to their measurement. The right-most column (other required operations) indicates the information beyond detection, necessary to complete the measurement, while the middle columns labeled (i), (ii), and (iii) indicate the specific aspects of detection required for that variable: (i) whether it requires a good segmentation, i.e., the discrimination of which pixels in the scene correspond to buds and which correspond to non-bud; (ii) a good correspondence identification, i.e., discrimination of bud pixels as belonging to different buds; or (iii) a good localization, i.e., the localization of the bud within the scene. For instance, regarding the bud number variable, for it to coincide with the detection count, different components detected for the same bud must be bundled together as a single detection. For the type-of-bud classification, in addition to correctly identifying components with buds, the segmentation of the part of the image corresponding to the bud must minimize the noise produced by background pixels. Lastly, to measure the incidence of sunlight on the bud, localization rather than segmentation is necessary, plus the leaf 3D surface geometry.
A good detector, therefore, should be evaluated on all three aspects of segmentation, correspondence identification and localization. This is easy for our detector as its implementation first produces a segmentation mask, which is then post-processed to produce correspondence identification and localization. The specific aspects of this approach are detailed in Section 2. The analysis of detection results presented in Section 3 shows that this approach is superior to state-of-the-art algorithms for grapevine bud detection. Finally, Section 4 discusses the scope, limitations of the results obtained for bud detection, sufficiency of the performance achieved for the measurement of a selection of variables in Table 3, as well as the most important conclusions, future work and potential improvements.
A wide variety of research using computer vision and machine learning algorithms to acquire information about vineyards (Seng et al., 2018) can be found in the literature, such as berry and bunch detection (Nuske et al., 2011), fruit size and weight estimation (Tardaguila et al., 2012), leaf area indices and yield estimation (Diago et al., 2012), plant phenotyping (Herzog et al., 2014a, Herzog et al., 2014b), autonomous selective spraying (Berenstein et al., 2010), and more (Tardáguila et al., 2012, Whalley and Shanmuganathan, 2013). Among the outstanding computer algorithms in recent years, artificial neural networks have aroused great interest in the industry as a means to carry out various visual recognition tasks (Hirano et al., 2006, Kahng et al., 2017, Tilgner et al., 2019). In particular, Convolutional Neural Networks (CNN) have become the dominant machine learning approach to visual object recognition (Ning et al., 2017). Two recent studies have successfully applied visual recognition techniques based on deep learning networks to identify viticultural variables to estimate production in vineyards. One of them, Grimm et al. (2019), uses an FCN to carry out segmentation of grapevine plant organs such as young shoots, pedicels, flowers or grapes. The other, Rudolph et al. (2018), uses images of grapevines under field conditions that are segmented using a CNN to detect inflorescences as regions of interest, and over these regions, the circle Hough Transform algorithm is applied to detect flowers.
Several works aim at detecting and locating buds in different types of crops by means of autonomous visual recognition systems. For instance, Tarry et al. (2014) presents an integrated system for chrysanthemum bud detection that can be used to automate labour intensive tasks in floriculture greenhouses. More recently, Zhao et al. (2018) presented a computer vision system used to identify the internodes and buds of stalk crops. To the best of our knowledge and research efforts, there are at least four works that specifically address the problem of bud detection in the grapevine by using autonomous visual recognition systems. The research work by Xu et al., 2014, Herzog et al., 2014b and Pérez et al. (2017) apply different techniques to perform 2D image detection involving different computer and machine learning algorithms. In addition, Díaz et al. (2018) introduces a workflow to localize buds in 3D space. The most relevant details of each are presented below.
Xu et al. (2014)’s study presents a bud detection algorithm using indoor captured RGB images and controlled lighting and background conditions specifically to establish a groundwork for an autonomous pruning system in winter. The authors apply a threshold filter to discriminate the background of the plant skeleton, resulting in a binary image. They assume that the shape of buds resembles corners and apply the Harris corner detector algorithm over the binary image to detect them. This process obtains a recall of , i.e., of the buds were detected.
Herzog et al. (2014b)’s work presents three methods for the detection of buds in very advanced stages of development when the buds have already burst and the first leaves are emerging. All methods are semi-automatic and require human intervention to validate the quality of the results. The best result is obtained using an RGB image with an artificial black background and corresponds to a recall of . The authors argue that this recall is enough to solve the problem of phenotyping vines. They also argue that these good results can be explained by the particular green color and the morphology of the already sprouting buds of approximately 2 cm.
Pérez et al. (2017) outlines an approach for the classification of bud images in winter, using SVM as a classifier and Bag of Features to compute visual descriptors. They report a recall of over and an accuracy of when sorting images containing at least of a bud and a ratio of 20–80% of bud vs. non-bud pixels. They argue that this classifier can be used in algorithms for 2D localization of the sliding windows type due to its robustness to variation in window size and position. It is precisely this idea that has been reproduced in the present work to implement the baseline competitor to our approach.
Finally, Díaz et al. (2018) introduces a workflow for the localization of buds in 3D space. The workflow consists of five steps. The first one reconstructs a 3D point cloud corresponding to the grapevine structure from several RGB images. The second step applies a 2D detection method using the sliding window and patch classification technique of Pérez et al. (2017). The next step uses a voting scheme to classify each point in the cloud as a bud or non-bud. The fourth step applies the DBSCAN clustering algorithm to group points in the cloud that correspond to a bud. Finally, in the fifth step, the localization is performed, obtaining the center of mass coordinates of each 3D point cluster. They report a recall of and a precision of and a localization error of approximately 1.5 cm, or 3 bud diameters.
Although these research studies represent a great advance in relation to the problem of detecting and localizing buds, they still show at least one of the following limitations: (i) use of artificial background outdoors; (ii) controlled lighting indoors; (iii) need for user interaction; (iv) bud detection in very advanced stages of development; (v) low bud detection/classification recall, and (vi) although some of these works perform some kind of segmentation process as part of the approach, none of them aim to segment the bud or report metrics of the quality of the segmentation performed. These limitations represent a major barrier to the effective development of tools for measuring bud-related variables.
Section snippets
Fully Convolutional Network with MobileNet (FCN-MN)
As outlined in the introduction, the approach proposes the use of computer vision algorithms to: (i) segment buds by classifying which pixels in the scene correspond to buds and which correspond to background (non-buds), (ii) identify bud correspondences by discriminating those pixels that belong to different buds in the observed scene, and (iii) localize each bud in the scene.
For the segmentation operation, i.e., pixel classification, the fully convolutional network introduced in Long et al.
Experimental results
In this section we present a systematic evaluation of the quality of our proposed FCN-MN procedure for bud detection over all three aspects of detection required for the measurement of the relevant bud-related variables listed in Table 1: segmentation, correspondence identification, and localization. First, in the following subsection, we present metrics that quantify the quality of these aspects, followed by Section 3 that presents the results for the metric values obtained for different
Discussion and conclusions
This section discusses the results obtained by the proposed approach in the context of the problem of grapevine bud detection and its impact as a tool for measuring viticultural variables of interest. The discussion is complemented with some highlights of the most important conclusions together with some potential lines of future work.
This work introduces FCN-MN, a Fully Convolutional Network with MobileNet architecture for the detection of grapevine buds in 2D images captured in natural field
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was funded by the Argentinean Universidad Tecnológica Nacional (UTN), the National Council of Scientific and Technical Research (CONICET), and the National Fund for Scientific and Technological Promotion (FONCyT).
References (46)
- et al.
Grapevine buds detection and localization in 3d space based on structure from motion and 2d image classification
Comput. Ind.
(2018) - et al.
A survey on deep learning techniques for image and video semantic segmentation
Appl. Soft Comput.
(2018) - et al.
An adaptable approach to automated visual detection of plant organs with applications in grapevine breeding
Biosyst. Eng.
(2019) - et al.
A survey on deep learning in medical image analysis
Med. Image Anal.
(2017) - et al.
Image classification for detection of winter grapevine buds in natural conditions using scale-invariant features transform, bag of features and support vector machines
Comput. Electron. Agric.
(2017) - et al.
Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association
Comput. Electron. Agric.
(2020) - et al.
Grape clusters and foliage detection algorithms for autonomous selective vineyard sprayer
Intel. Serv. Robot.
(2010) Lessons from nearly 20 years of precision agriculture research, development, and adoption as a guide to its appropriate application
Crop Pasture Sci.
(2009)- et al.
Effects of canopy management practices on grapevine bud fruitfulness
OENO One
(2020) - et al.
Grapevine yield and leaf area estimation using supervised classification methodology on rgb images taken under field conditions
Sensors
(2012)