Abstract
The present paper aims to study image processing algorithms to accelerate and facilitate the evaluation of the harvest condition in tomato farms. In order to achieve this, two different deep learning models are trained and combined with counting methods to produce a harvest monitoring system for embedded applications using an Intel® MovidiusTM and an affordable RGB camera. The first model detects the location of cherry tomato clusters, while the second estimates the fruit’s maturity. The results are compared to a baseline implementation based on segmentation. Next, a multiple counting method based on regions of interest is applied to the detected clusters in videos to count the tomatoes at different maturity stages. In order to produce a more robust counting, a tracking system is implemented which uses temporal information to find the unique tomato clusters in videos. In the evaluation stage, the obtained location results indicate an intersection over union (\( IoU \)) of about \(89\%\) when using the MobileNetV1 as a feature extractor and choosing the appropriate location anchors. The maturity estimation results indicate better performance for the trained algorithm as compared to the baseline, providing a root mean square error of \(7.7\%\). The best results were obtained when combining the fully learned solution with the tracking system, correctly counting the majority of the tomato clusters at multiple maturity stages.
Similar content being viewed by others
Notes
Root AI, “Join the future of farming,” 2019. Available: https://root-ai.com/ [November 5, 2020].
Tzutalin. LabelImg. Git code (2015). Available: https://github.com/tzutalin/labelImg [August 3, 2018].
Intel® Movidius Compute StickTM. Available: https://software.intel.com/en-us/neural-compute-stick [January 1, 2019] as an embedded solution for real-time processing.
NVIDIA® Jetson AGX XavierTM. Available: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/ [November 5, 2020].
References
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615
Behera, S.K., Rath, A.K., Sethy, P.K.: Maturity status classification of papaya fruits based on machine learning and transfer learning approach. Inf. Process. Agricult. (2020). https://doi.org/10.1016/j.inpa.2020.05.003
Bengio, Y.: Rmsprop and equilibrated adaptive learning rates for nonconvex optimization. corr abs/1502.04390 (2015)
Bianco, S., Cadene, R., Celona, L., Napoletano, P.: Benchmark analysis of representative deep neural network architectures. IEEE Access 6, 64270–64277 (2018). https://doi.org/10.1109/ACCESS.2018.2877890
Brahimi, M., Boukhalfa, K., Moussaoui, A.: Deep learning for tomato diseases: classification and symptoms visualization. Appl. Artif. Intell. 31(4), 299–315 (2017)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Brezmes, J., Llobet, E., Vilanova, X., Saiz, G., Correig, X.: Fruit ripeness monitoring using an electronic nose. Sensors Actuators B Chem. 69(3), 223–229 (2000). https://doi.org/10.1016/S0925-4005(00)00494-9
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184
Faisal, M., Albogamy, F., Elgibreen, H., Algabri, M., Alqershi, F.A.: Deep learning and computer vision for estimating date fruits type, maturity level, and weight. IEEE Access 8, 206770–206782 (2020)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014) https://doi.org/10.1109/CVPR.2014.81
Halstead, M., McCool, C., Denman, S., Perez, T., Fookes, C.: Fruit quantity and ripeness estimation using a robotic vision system. IEEE Robot. Autom. Lett. 3(4), 2995–3002 (2018). https://doi.org/10.1109/LRA.2018.2849514
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision. Cambridge University Press, Cambridge (2003)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Herteno, R., Faisal, M.R., Nugroho, R.A., Abadi, F., Ramadhani, R.: Object counting pada data video. KLIK-KUMPULAN JURNAL ILMU KOMPUTER 7(1), 92–102 (2020)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
Kaur, K., Gupta, O.: A machine learning approach to determine maturity stages of tomatoes. Orient. J. Comput. Sci. Technol. 10(3), 683–690 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. corr abs/1412.6980 (2014)
Kondo, N., Yamamoto, K., Shimizu, H., Yata, K., Kurita, M., Shiigi, T., Monta, M., Nishizu, T.: A machine vision system for tomato cluster harvesting robot. Eng. Agricult. Environ. Food 2(2), 60–65 (2009). https://doi.org/10.1016/S1881-8366(09)80017-7
Kondo, N., Yata, K., Iida, M., Shiigi, T., Monta, M., Kurita, M., Omori, H.: Development of an end-effector for a tomato cluster harvesting robot. Eng. Agricult. Environ. Food 3(1), 20–24 (2010). https://doi.org/10.1016/S1881-8366(10)80007-2
Kress-Rogers, E., Brimelow, C.J.: Instrumentation and Sensors for the Food Industry, vol. 65. Woodhead Publishing, Sawston (2001)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012). https://doi.org/10.1145/3065386
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, G., Mao, S., Jin, H., Kim, J.H.: A robust mature tomato detection in greenhouse scenes using machine learning and color analysis. In: Proceedings of the 2019 11th International Conference on Machine Learning and Computing, pp. 17–21 (2019)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liu, X., Chen, S.W., Aditya, S., Sivakumar, N., Dcunha, S., Qu, C., Taylor, C.J., Das, J., Kumar, V.: Robust fruit counting: combining deep learning, tracking, and structure from motion. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1045–1052 (2018). https://doi.org/10.1109/IROS.2018.8594239
Luo, L., Tang, Y., Zou, X., Wang, C., Zhang, P., Feng, W.: Robust grape cluster detection in a vineyard by combining the adaboost framework and multiple color components. Sensors 16(12), 2098 (2016)
Młotek, M., Kuta, Ł., Stopa, R., Komarnicki, P.: The effect of manual harvesting of fruit on the health of workers and the quality of the obtained produce. Procedia Manuf. 3, 1712–1719 (2015). https://doi.org/10.1016/j.promfg.2015.07.494
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Ohta, Y.I., Kanade, T., Sakai, T.: Color information for region segmentation. Comput. Graph. Image Process. 13(3), 222–241 (1980). https://doi.org/10.1016/0146-664X(80)90047-7
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man. Cybern. 9(1), 62–66 (1979). https://doi.org/10.1109/TSMC.1979.4310076
Özlü, A.: Tensorflow object counting api (2018). https://github.com/ahmetozlu/tensorflow_object_counting_api
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009). https://doi.org/10.1109/TKDE.2009.191
Povey, M.: Ultrasonics in food engineering part ii: applications. J. Food Eng. 9(1), 1–20 (1989). https://doi.org/10.1016/0260-8774(89)90047-2
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015). https://doi.org/10.1109/TPAMI.2016.2577031
Romera-Paredes, B., Torr, P.H.S.: Recurrent instance segmentation. In: European Conference on Computer Vision, pp. 312–329. Springer (2016). https://doi.org/10.1007/978-3-319-46466-4_19
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Santos, T.T., de Souza, L.L., dos Santos, A.A., Avila, S.: Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association. Comput. Electron. Agric. 170, 105247 (2020)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sun, J., He, X., Wu, M., Wu, X., Shen, J., Lu, B.: Detection of tomato organs based on convolutional neural network under the overlap and occlusion backgrounds. Mach. Vis. Appl. (2020). https://doi.org/10.1007/s00138-020-01081-6
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Szegedy, C., Reed, S., Erhan, D., Anguelov, D., Ioffe, S.: Scalable, high-quality object detection. arXiv:1412.1441 (2014)
Tenorio, G., Villalobos, C., Mendoza, L., Costa da Silva, E., Caarls, W.: Improving transfer learning performance: an application in the classification of remote sensing data. In: Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, pp. 174–183. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007372201740183
Viola, P., Jones, M.: Robust real-time face detection. In: null, p. 747. IEEE (2001). https://doi.org/10.1109/ICCV.2001.937709
Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: null, p. 734. IEEE (2003). https://doi.org/10.1109/ICCV.2003.1238422
Wang, C., Li, X., Wang, W., Feng, Y., Zhou, Z., Zhan, H.: Recognition of worm-eaten chestnuts based on machine vision. Math. Comput. Modell. 54(3–4), 888–894 (2011)
Wei, X., Jia, K., Lan, J., Li, Y., Zeng, Y., Wang, C.: Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot. Opt.-Int. J. Light Electron Opt. 125(19), 5684–5689 (2014). https://doi.org/10.1016/j.ijleo.2014.07.001
Weng, S.K., Kuo, C.M., Tu, S.K.: Video object tracking using adaptive kalman filter. J. Vis. Commun. Image Represent. 17(6), 1190–1208 (2006). https://doi.org/10.1016/j.jvcir.2006.03.004
Yadav, G., Maheshwari, S., Agarwal, A.: Contrast limited adaptive histogram equalization based enhancement for real time video system. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2392–2397. IEEE (2014)
Acknowledgements
This work was partially funded by a Masters Scholarship supported by the National Council for Scientific and Technological Development (CNPq) at the Pontifical University Catholic of Rio de Janeiro, Brazil. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES)—Finance Code 001. This work was partially supported by the UTFORSK Partnership Programme from The Norwegian Centre for International Cooperation in Education (SIU), project number UTF-2016-long-term/10097. We thank the Applied Computational Intelligence Laboratory at the Pontifical University Catholic of Rio de Janeiro, Brazil for providing the Nvidia ® GPUs used for training and the Intel® Neural Compute Stick device used for inference.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional results
Additional results
We also verified some tracking results for occlusion cases in the videos and their impacts on maturation rate when using the DDT configuration. The results are shown in Figs. 10, 11 and 12.
It can be noticed that the object detection algorithm in Fig. 10 predicted the bounding boxes even with occlusion, but the maturation rank was affected when the occlusion was occurring which decreased the rank value for a while. On the other hand, our tracking algorithm uses the rank information between 20 frames and calculates a smoothed maturity function which mitigates this rank discrepancy.
The same effect happens in Fig. 11, the rank estimate was higher in the first image because of the cluster behind, but the rank discrepancy is also mitigated by the tracking algorithm.
Figure 12 results show a moment when the object detection algorithm loses the detection due to occlusion, then detects the cluster again. The tracking system could identify correctly the two detections as the same cluster. An important point is that in the uncropped image, five other clusters were being tracked.
Rights and permissions
About this article
Cite this article
Tenorio, G.L., Caarls, W. Automatic visual estimation of tomato cluster maturity in plant rows. Machine Vision and Applications 32, 78 (2021). https://doi.org/10.1007/s00138-021-01202-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01202-9