Skip to main content
Log in

Automatic visual estimation of tomato cluster maturity in plant rows

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The present paper aims to study image processing algorithms to accelerate and facilitate the evaluation of the harvest condition in tomato farms. In order to achieve this, two different deep learning models are trained and combined with counting methods to produce a harvest monitoring system for embedded applications using an Intel® MovidiusTM and an affordable RGB camera. The first model detects the location of cherry tomato clusters, while the second estimates the fruit’s maturity. The results are compared to a baseline implementation based on segmentation. Next, a multiple counting method based on regions of interest is applied to the detected clusters in videos to count the tomatoes at different maturity stages. In order to produce a more robust counting, a tracking system is implemented which uses temporal information to find the unique tomato clusters in videos. In the evaluation stage, the obtained location results indicate an intersection over union (\( IoU \)) of about \(89\%\) when using the MobileNetV1 as a feature extractor and choosing the appropriate location anchors. The maturity estimation results indicate better performance for the trained algorithm as compared to the baseline, providing a root mean square error of \(7.7\%\). The best results were obtained when combining the fully learned solution with the tracking system, correctly counting the majority of the tomato clusters at multiple maturity stages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Root AI, “Join the future of farming,” 2019. Available: https://root-ai.com/ [November 5, 2020].

  2. Tzutalin. LabelImg. Git code (2015). Available: https://github.com/tzutalin/labelImg [August 3, 2018].

  3. Intel® Movidius Compute StickTM. Available: https://software.intel.com/en-us/neural-compute-stick [January 1, 2019] as an embedded solution for real-time processing.

  4. NVIDIA® Jetson AGX XavierTM. Available: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/ [November 5, 2020].

References

  1. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  2. Behera, S.K., Rath, A.K., Sethy, P.K.: Maturity status classification of papaya fruits based on machine learning and transfer learning approach. Inf. Process. Agricult. (2020). https://doi.org/10.1016/j.inpa.2020.05.003

  3. Bengio, Y.: Rmsprop and equilibrated adaptive learning rates for nonconvex optimization. corr abs/1502.04390 (2015)

  4. Bianco, S., Cadene, R., Celona, L., Napoletano, P.: Benchmark analysis of representative deep neural network architectures. IEEE Access 6, 64270–64277 (2018). https://doi.org/10.1109/ACCESS.2018.2877890

    Article  Google Scholar 

  5. Brahimi, M., Boukhalfa, K., Moussaoui, A.: Deep learning for tomato diseases: classification and symptoms visualization. Appl. Artif. Intell. 31(4), 299–315 (2017)

    Article  Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  7. Brezmes, J., Llobet, E., Vilanova, X., Saiz, G., Correig, X.: Fruit ripeness monitoring using an electronic nose. Sensors Actuators B Chem. 69(3), 223–229 (2000). https://doi.org/10.1016/S0925-4005(00)00494-9

    Article  Google Scholar 

  8. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  9. Faisal, M., Albogamy, F., Elgibreen, H., Algabri, M., Alqershi, F.A.: Deep learning and computer vision for estimating date fruits type, maturity level, and weight. IEEE Access 8, 206770–206782 (2020)

    Article  Google Scholar 

  10. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014) https://doi.org/10.1109/CVPR.2014.81

  12. Halstead, M., McCool, C., Denman, S., Perez, T., Fookes, C.: Fruit quantity and ripeness estimation using a robotic vision system. IEEE Robot. Autom. Lett. 3(4), 2995–3002 (2018). https://doi.org/10.1109/LRA.2018.2849514

    Article  Google Scholar 

  13. Hartley, R., Zisserman, A.: Multiple view geometry in computer vision. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  16. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)

    Article  Google Scholar 

  17. Herteno, R., Faisal, M.R., Nugroho, R.A., Abadi, F., Ramadhani, R.: Object counting pada data video. KLIK-KUMPULAN JURNAL ILMU KOMPUTER 7(1), 92–102 (2020)

    Article  Google Scholar 

  18. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)

  19. Kaur, K., Gupta, O.: A machine learning approach to determine maturity stages of tomatoes. Orient. J. Comput. Sci. Technol. 10(3), 683–690 (2017)

    Article  Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. corr abs/1412.6980 (2014)

  21. Kondo, N., Yamamoto, K., Shimizu, H., Yata, K., Kurita, M., Shiigi, T., Monta, M., Nishizu, T.: A machine vision system for tomato cluster harvesting robot. Eng. Agricult. Environ. Food 2(2), 60–65 (2009). https://doi.org/10.1016/S1881-8366(09)80017-7

    Article  Google Scholar 

  22. Kondo, N., Yata, K., Iida, M., Shiigi, T., Monta, M., Kurita, M., Omori, H.: Development of an end-effector for a tomato cluster harvesting robot. Eng. Agricult. Environ. Food 3(1), 20–24 (2010). https://doi.org/10.1016/S1881-8366(10)80007-2

    Article  Google Scholar 

  23. Kress-Rogers, E., Brimelow, C.J.: Instrumentation and Sensors for the Food Industry, vol. 65. Woodhead Publishing, Sawston (2001)

    Book  Google Scholar 

  24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012). https://doi.org/10.1145/3065386

  25. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  26. Liu, G., Mao, S., Jin, H., Kim, J.H.: A robust mature tomato detection in greenhouse scenes using machine learning and color analysis. In: Proceedings of the 2019 11th International Conference on Machine Learning and Computing, pp. 17–21 (2019)

  27. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  28. Liu, X., Chen, S.W., Aditya, S., Sivakumar, N., Dcunha, S., Qu, C., Taylor, C.J., Das, J., Kumar, V.: Robust fruit counting: combining deep learning, tracking, and structure from motion. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1045–1052 (2018). https://doi.org/10.1109/IROS.2018.8594239

  29. Luo, L., Tang, Y., Zou, X., Wang, C., Zhang, P., Feng, W.: Robust grape cluster detection in a vineyard by combining the adaboost framework and multiple color components. Sensors 16(12), 2098 (2016)

    Article  Google Scholar 

  30. Młotek, M., Kuta, Ł., Stopa, R., Komarnicki, P.: The effect of manual harvesting of fruit on the health of workers and the quality of the obtained produce. Procedia Manuf. 3, 1712–1719 (2015). https://doi.org/10.1016/j.promfg.2015.07.494

    Article  Google Scholar 

  31. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)

    Article  MathSciNet  Google Scholar 

  32. Ohta, Y.I., Kanade, T., Sakai, T.: Color information for region segmentation. Comput. Graph. Image Process. 13(3), 222–241 (1980). https://doi.org/10.1016/0146-664X(80)90047-7

    Article  Google Scholar 

  33. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man. Cybern. 9(1), 62–66 (1979). https://doi.org/10.1109/TSMC.1979.4310076

    Article  MathSciNet  Google Scholar 

  34. Özlü, A.: Tensorflow object counting api (2018). https://github.com/ahmetozlu/tensorflow_object_counting_api

  35. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009). https://doi.org/10.1109/TKDE.2009.191

    Article  Google Scholar 

  36. Povey, M.: Ultrasonics in food engineering part ii: applications. J. Food Eng. 9(1), 1–20 (1989). https://doi.org/10.1016/0260-8774(89)90047-2

    Article  Google Scholar 

  37. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  38. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015). https://doi.org/10.1109/TPAMI.2016.2577031

  39. Romera-Paredes, B., Torr, P.H.S.: Recurrent instance segmentation. In: European Conference on Computer Vision, pp. 312–329. Springer (2016). https://doi.org/10.1007/978-3-319-46466-4_19

  40. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  41. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  42. Santos, T.T., de Souza, L.L., dos Santos, A.A., Avila, S.: Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association. Comput. Electron. Agric. 170, 105247 (2020)

    Article  Google Scholar 

  43. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 (2013)

  44. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  45. Sun, J., He, X., Wu, M., Wu, X., Shen, J., Lu, B.: Detection of tomato organs based on convolutional neural network under the overlap and occlusion backgrounds. Mach. Vis. Appl. (2020). https://doi.org/10.1007/s00138-020-01081-6

    Article  Google Scholar 

  46. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

  47. Szegedy, C., Reed, S., Erhan, D., Anguelov, D., Ioffe, S.: Scalable, high-quality object detection. arXiv:1412.1441 (2014)

  48. Tenorio, G., Villalobos, C., Mendoza, L., Costa da Silva, E., Caarls, W.: Improving transfer learning performance: an application in the classification of remote sensing data. In: Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, pp. 174–183. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007372201740183

  49. Viola, P., Jones, M.: Robust real-time face detection. In: null, p. 747. IEEE (2001). https://doi.org/10.1109/ICCV.2001.937709

  50. Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: null, p. 734. IEEE (2003). https://doi.org/10.1109/ICCV.2003.1238422

  51. Wang, C., Li, X., Wang, W., Feng, Y., Zhou, Z., Zhan, H.: Recognition of worm-eaten chestnuts based on machine vision. Math. Comput. Modell. 54(3–4), 888–894 (2011)

    Article  Google Scholar 

  52. Wei, X., Jia, K., Lan, J., Li, Y., Zeng, Y., Wang, C.: Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot. Opt.-Int. J. Light Electron Opt. 125(19), 5684–5689 (2014). https://doi.org/10.1016/j.ijleo.2014.07.001

    Article  Google Scholar 

  53. Weng, S.K., Kuo, C.M., Tu, S.K.: Video object tracking using adaptive kalman filter. J. Vis. Commun. Image Represent. 17(6), 1190–1208 (2006). https://doi.org/10.1016/j.jvcir.2006.03.004

    Article  Google Scholar 

  54. Yadav, G., Maheshwari, S., Agarwal, A.: Contrast limited adaptive histogram equalization based enhancement for real time video system. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2392–2397. IEEE (2014)

Download references

Acknowledgements

This work was partially funded by a Masters Scholarship supported by the National Council for Scientific and Technological Development (CNPq) at the Pontifical University Catholic of Rio de Janeiro, Brazil. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES)—Finance Code 001. This work was partially supported by the UTFORSK Partnership Programme from The Norwegian Centre for International Cooperation in Education (SIU), project number UTF-2016-long-term/10097. We thank the Applied Computational Intelligence Laboratory at the Pontifical University Catholic of Rio de Janeiro, Brazil for providing the Nvidia ® GPUs used for training and the Intel® Neural Compute Stick device used for inference.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriel Lins Tenorio.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional results

Additional results

We also verified some tracking results for occlusion cases in the videos and their impacts on maturation rate when using the DDT configuration. The results are shown in Figs. 10, 11 and 12.

Fig. 10
figure 10

DDT results showing correct cluster detection in the presence of occlusion

Fig. 11
figure 11

DDT results showing cluster rank discrepancy mitigation through the tracking algorithm

Fig. 12
figure 12

DDT results showing loss and re-acquisition of a cluster by the tracking algorithm

It can be noticed that the object detection algorithm in Fig. 10 predicted the bounding boxes even with occlusion, but the maturation rank was affected when the occlusion was occurring which decreased the rank value for a while. On the other hand, our tracking algorithm uses the rank information between 20 frames and calculates a smoothed maturity function which mitigates this rank discrepancy.

The same effect happens in Fig. 11, the rank estimate was higher in the first image because of the cluster behind, but the rank discrepancy is also mitigated by the tracking algorithm.

Figure 12 results show a moment when the object detection algorithm loses the detection due to occlusion, then detects the cluster again. The tracking system could identify correctly the two detections as the same cluster. An important point is that in the uncropped image, five other clusters were being tracked.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tenorio, G.L., Caarls, W. Automatic visual estimation of tomato cluster maturity in plant rows. Machine Vision and Applications 32, 78 (2021). https://doi.org/10.1007/s00138-021-01202-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01202-9

Keywords

Navigation