Skip to main content
Log in

A Character Flow Framework for Multi-Oriented Scene Text Detection

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Scene text detection plays a significant role in various applications, such as object recognition, document management, and visual navigation. The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts. However, a large number of non-text pixels exist in the labels during the model training, leading to text mis-segmentation. In this paper, we propose a novel multi-oriented scene text detection framework, which includes two main modules: character instance segmentation (one instance corresponds to one character), and character flow construction (one character flow corresponds to one word). We use feature pyramid network (FPN) to predict character and non-character instances with arbitrary directions. A joint network of FPN and bidirectional long short-term memory (BLSTM) is developed to explore the context information among isolated characters, which are finally grouped into character flows. Extensive experiments are conducted on ICDAR2013, ICDAR2015, MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach. The F-measures are 92.62%, 88.02%, 83.69% and 77.81%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. TextBoxes: A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.

  2. Liao M H, Shi B G, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. https://doi.org/10.1109/TIP.2018.2825107.

    Article  MathSciNet  MATH  Google Scholar 

  3. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multiBox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37. https://doi.org/10.1007/978-3-319-46448-0_2.

  4. Liu Y L, Jin L W. Deep matching prior network: Toward tighter multi-oriented text detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3454-3461. https://doi.org/10.1109/CVPR.2017.368.

  5. Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B, Xue X Y. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. https://doi.org/10.1109/TMM.2018.2818020.

    Article  Google Scholar 

  6. Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R, Liang J J. EAST: An efficient and accurate scene text detector. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651. https://doi.org/10.1109/CVPR.2017.283.

  7. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3431-3440. https://doi.org/10.1109/CVPR.2015.7298965.

  8. Lyu P Y, Liao M H, Yao C, Wu W H, Bai X. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.71-88. https://doi.org/10.1007/978-3-030-01264-9_5.

  9. He K M, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, October 2017, pp.2980-2988. https://doi.org/10.1109/ICCV.2017.322.

  10. Wang W H, Xie E Z, Li X, Hou W B, Lu T, Shao S. Shape robust text detection with progressive scale expansion network. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9328-9337. https://doi.org/10.1109/CVPR.2019.00956.

  11. Xie E Z, Zang Y H, Shao S, Yu G, Yao C, Li G Y. Scene text detection with supervised pyramid context network. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27–February 1, 2019, pp.9038-9045. https://doi.org/10.1609/aaai.v33i01.33019038.

  12. Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3482-3490. https://doi.org/10.1109/CVPR.2017.371.

  13. Deng D, Liu H F, Li X L, Cai D. PixelLink: Detecting scene text via instance segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.6773-6780.

  14. Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944. https://doi.org/10.1109/CVPR.2017.106.

  15. Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9357-9366. https://doi.org/10.1109/CVPR.2019.00959.

  16. Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72. https://doi.org/10.1007/978-3-319-46484-8_4.

  17. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5/6): 602-610. https://doi.org/10.1016/j.neunet.2005.06.042.

    Article  Google Scholar 

  18. Lyu P Y, Yao C, Wu W H, Yan S C, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.7553-7563. https://doi.org/10.1109/CVPR.2018.00788.

  19. Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 2010 IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970. https://doi.org/10.1109/CVPR.2010.5540041.

  20. Wu H, Zou B J, Zhao Y Q, Guo J J. Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. The Visual Computer, 2017, 33(1): 113-126. https://doi.org/10.1007/s00371-015-1156-1.

    Article  Google Scholar 

  21. Chen H Z, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 2011 IEEE International Conference on Image Processing, September 2011, pp.2609-2612. https://doi.org/10.1109/ICIP.2011.6116200.

  22. Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767. https://doi.org/10.1016/j.imavis.2004.02.006.

    Article  Google Scholar 

  23. Yin X C, Yin X W, Huang K Z, Hao H W. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(5): 970-983. https://doi.org/10.1109/TPAMI.2013.182.

    Article  Google Scholar 

  24. Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371.

    Article  Google Scholar 

  25. Liao M H, Zhu Z, Shi B G, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5909-5918. https://doi.org/10.1109/CVPR.2018.00619.

  26. Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y, Bai X. Multi-oriented text detection with fully convolutional networks. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167. https://doi.org/10.1109/CVPR.2016.451.

  27. Long S B, Ruan J Q, Zhang W J, He X, Wu W H, Yao C. TextSnake: A flexible representation for detecting text of arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.19-35. https://doi.org/10.1007/978-3-030-01216-8_2.

  28. Vincent L, Soille P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(6): 583-598. https://doi.org/10.1109/34.87344.

    Article  Google Scholar 

  29. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90

  30. Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2315-2324. https://doi.org/10.1109/CVPR.2016.254.

  31. Tian S X, Pan Y F, Huang C, Lu S J, Yu K, Tan C L. Text flow: A unified text detection system in natural scene images. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.4651-4659. https://doi.org/10.1109/ICCV.2015.528.

  32. Gers F A, Schraudolph N N, Schmidhuber J. Learning precise timing with LSTM recurrent networks. The Journal of Machine Learning Research, 2002, 3: 115-143. https://doi.org/10.1162/153244303768966139.

    Article  MathSciNet  MATH  Google Scholar 

  33. Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L G, Mestre S R, Mas J, Mota D F, Almazàn J A, Heras L P. ICDAR 2013 robust reading competition. In Proc. the 12th International Conference on Document Analysis and Recognition, August 2013, pp.1484-1493. https://doi.org/10.1109/IC-DAR.2013.221.

  34. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S J, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. In Proc. the 13th International Conference on Document Analysis and Recognition, August 2015, pp.1156-1160. https://doi.org/10.1109/IC-DAR.2015.7333942.

  35. Yao C, Bai X, Liu W Y, Ma Y, Tu Z W. Detecting texts of arbitrary orientations in natural images. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090. https://doi.org/10.1109/CVPR.2012.6247787.

  36. Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z B, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J C, Liu C L, Ogier J M. ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In Proc. the 14th IAPR International Conference on International Conference on Document Analysis and Recognition, November 2017, pp.1454-1459. https://doi.org/10.1109/ICDAR.2017.237.

  37. Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.

  38. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.

  39. Jiang Y Y, Zhu X Y, Wang X B, Yang S L, Li W, Wang H, Fu P, Luo Z B. R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv:1706.09579, 2017. https://arxiv.org/abs/1706.09579, Apr. 2021.

  40. He P, Huang W L, He T, Zhu Q L, Qiao Y, Li X L. Single shot text detector with regional attention. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.3066-3074. https://doi.org/10.1109/ICCV.2017.331.

  41. Tian Z T, Shu M, Lyu P Y, Li R Y, Zhou C, Shen X Y, Jia J Y. Learning shape-aware embedding for scene text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.4234-4243. https://doi.org/10.1109/CVPR.2019.00436.

  42. Liao M H, Wan Z Y, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In Proc. the 34th AAAI Conference on Artificial Intelligence, February 2020, pp.11474-11481. https://doi.org/10.1609/aaai.v34i07.6812.

  43. Liu X B, Liang D, Yan S, Chen D G, Qiao Y, Yan J J. FOTS: Fast oriented text spotting with a unified network. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5676-5685. https://doi.org/10.1109/CVPR.2018.00595.

  44. Zhang S X, Zhu X B, Hou J B, Liu C, Yang C, Wang H F, Yin X C. Deep relational reasoning graph network for arbitrary shape text detection. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.9696-9705. https://doi.org/10.1109/CVPR42600.2020.00972.

  45. Zhang C Q, Liang B R, Huang Z M, En M Y, Han J Y, Ding E R, Ding X H. Look more than once: An accurate detector for text of arbitrary shapes. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.10544-10553. https://doi.org/10.1109/CVPR.2019.01080.

  46. Li Y, Yu Y J, Li Z F, Lin Y K, Xu M F, Li J W, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv:1811.07432, 2018. https://arxiv.org/abs/1811.07432, Apr. 2021.

  47. Huang Z D, Zhong Z Y, Sun L, Huo Q. Mask R-CNN with pyramid attention network for scene text detection. In Proc. the 2019 IEEE Winter Conference on Applications of Computer Vision, January 2019, pp.764-772. https://doi.org/10.1109/WACV.2019.00086.

  48. He W H, Zhang X Y, Yin F, Liu C L. Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 2018, 27(11): 5406-5419. https://doi.org/10.1109/TIP.2018.2855399.

    Article  MathSciNet  Google Scholar 

  49. Xue C H, Lu S J, Zhan F N. Accurate scene text detection through border semantics awareness and bootstrapping. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.370-387. https://doi.org/10.1007/978-3-030-01270-0_22.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu Liu.

Supplementary Information

ESM 1

(PDF 174 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, WJ., Zou, BJ., Li, KW. et al. A Character Flow Framework for Multi-Oriented Scene Text Detection. J. Comput. Sci. Technol. 36, 465–477 (2021). https://doi.org/10.1007/s11390-021-1362-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-1362-4

Keywords

Navigation