Skip to main content
Log in

Ontological Approach to Image Captioning Evaluation

  • SPECIAL ISSUE
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

The paper considers the ontology of the existing metrics widely used for image captioning task evaluation. It is shown how the ontological approach provides more natural and resilient way to image captioning quality assurance in comparison with machine translation metrics variations. Another important problem, discussed in the paper, is the information support for researchers in the field of image captioning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1.
Fig. 2.

Similar content being viewed by others

REFERENCES

  1. Z. Hossain, F. Sohel, M.F. Shiratuddin, and H. Laga, “A comprehensive survey of deep learning for image captioning,” ACM Comput. Surv. 51 (6), Article No. 118, 1–36 (2019).

    Google Scholar 

  2. X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, “Microsoft COCO captions: Data collection and evaluation server,” arXiv preprint arXiv:1504.00325 (2015).

  3. R. Krishna, Y. Zhu, O. Groth, et al., “Visual Genome: Connecting language and vision using crowdsourced dense image annotations,” Int. J. Comput. Vision 123 (1), 32–73 (2017).

    Article  MathSciNet  Google Scholar 

  4. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” in Proc. 40th Annual Meeting of the Association for Computational Linguistics (ACL) (Philadelphia, PA, USA, 2002), pp. 311–318.

  5. C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, Proc. ACL-04 Workshop (Barcelona, Spain, 2004), pp. 74–81.

  6. S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proc. ACL-05 Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (Ann Arbor, MI, USA, 2005), pp. 65–72.

  7. R. Vedantam, C. L. Zitnick, and D. Parikh, “CIDEr: Consensus-based image description evaluation,” in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, 2015), pp. 4566–4575.

  8. P. Anderson, B. Fernando, M. Johnson, and S. Gould, “SPICE: Semantic Propositional Image Caption Evaluation,” in Computer VisionECCV 2016, Proc. 14th European Conference, Part V, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 9909, pp. 382–398.

  9. S. Liu, Z. Zhu, N. Ye, S. Guadarrama, and K. Murphy, “Improved image captioning via policy gradient optimization of SPIDEr,” in Proc. 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017) (Venice, Italy, 2017), pp. 873–881.

  10. M. Kilickaya, A. Erdem, N. Ikizler-Cinbis, and E. Erdem, “Re-evaluating automatic metrics for image captioning,” arXiv preprint arXiv:1612.07600 (2016).

  11. V. V. Golenkov and N. A. Gulyakina, “Project of open semantic technology of the componential design of intelligent systems. Part 2: Unified design models,” Ontologiya Proektirovaniya (Ontology Des.), No. 4 (14), 34–53 (2014) [in Russian].

  12. I. Davydenko, “Semantic models, method and tools of knowledge bases coordinated development based on reusable components,” Otkrytye Semanticheskie Tekhnologii Proektirovaniya Intellektual’nykh System (Open Semantic Technol. Intell. Syst.), Issue 2, pp. 99–118 (2018).

  13. D. V. Shunkevich, “Agent-oriented models, methods and tools of compatible problem solvers development for intelligent systems,” Otkrytye Semanticheskie Tekhnologii Proektirovaniya Intellektual’nykh System (Open Semantic Technol. Intell. Syst.), Issue 2, pp. 119–132 (2018).

  14. IMS metasystem. [Online resource]. Available at: http://ims.ostis.net/.

  15. J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 3242–3250.

  16. Z. Gan, C. Gan, X. He, Y. Pu, K. Tran, J. Gao, L. Carin, and L. Deng, “Semantic compositional networks for visual captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 5630–5639.

  17. L. Zhang, F. Sung, F. Liu, T. Xiang, S. Gong, Y. Yang, and T. M. Hospedales, “Actor-critic sequence training for image captioning,” arXiv preprint arXiv:1706.09601 (2017).

  18. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Self-critical sequence training for image captioning,” in Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (Honolulu, HI, USA, 2017), pp. 7008–7024.

  19. J. Gu, G. Wang, J. Cai, and T. Chen. “An empirical study of language CNN for image captioning,” in Proc. 2017 IEEE Int. Conf. on Computer Vision (ICCV 2017) (Venice, Italy, 2017), pp. 1222–1231.

  20. P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. “Bottom-up and top-down attention for image captioning and visual question answering,” in Proc.2018IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) (Salt Lake City, UT, USA 2018), pp. 6077–6086.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to D. Shunkevich or N. Iskra.

Ethics declarations

The authors declare that they have no conflict of interest.

Additional information

Daniil Shunkevich was born in 1990. In 2012 he graduated from Belarusian State University of Informatics and Radioelectronics (BSUIR) with honors diploma majoring in Artificial intelligence. PhD (2018). Head of the Department of Intelligent information technologies in BSUIR. Has over 75 published works on the subject of semantic technologies and problem solvers development.

Natalia Iskra was born in 1985 and graduated from BSUIR in 2007. Deputy Head of the Electronic Computing Machines Department in BSUIR. Has over 30 publications on neural networks and image processing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shunkevich, D., Iskra, N. Ontological Approach to Image Captioning Evaluation. Pattern Recognit. Image Anal. 30, 288–294 (2020). https://doi.org/10.1134/S1054661820030256

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661820030256

Keywords:

Navigation