A Character Flow Framework for Multi-Oriented Scene Text Detection

Yang, Wen-Jun; Zou, Bei-Ji; Li, Kai-Wen; Liu, Shu

doi:10.1007/s11390-021-1362-4

A Character Flow Framework for Multi-Oriented Scene Text Detection

Regular Paper
Published: 31 May 2021

Volume 36, pages 465–477, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Wen-Jun Yang^1,2,
Bei-Ji Zou^1,2,
Kai-Wen Li^1,2 &
…
Shu Liu^1,2

316 Accesses
1 Citation
Explore all metrics

Abstract

Scene text detection plays a significant role in various applications, such as object recognition, document management, and visual navigation. The instance segmentation based method has been mostly used in existing research due to its advantages in dealing with multi-oriented texts. However, a large number of non-text pixels exist in the labels during the model training, leading to text mis-segmentation. In this paper, we propose a novel multi-oriented scene text detection framework, which includes two main modules: character instance segmentation (one instance corresponds to one character), and character flow construction (one character flow corresponds to one word). We use feature pyramid network (FPN) to predict character and non-character instances with arbitrary directions. A joint network of FPN and bidirectional long short-term memory (BLSTM) is developed to explore the context information among isolated characters, which are finally grouped into character flows. Extensive experiments are conducted on ICDAR2013, ICDAR2015, MSRA-TD500 and MLT datasets to demonstrate the effectiveness of our approach. The F-measures are 92.62%, 88.02%, 83.69% and 77.81%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. TextBoxes: A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.
Liao M H, Shi B G, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. https://doi.org/10.1109/TIP.2018.2825107.
Article MathSciNet MATH Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multiBox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37. https://doi.org/10.1007/978-3-319-46448-0_2.
Liu Y L, Jin L W. Deep matching prior network: Toward tighter multi-oriented text detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3454-3461. https://doi.org/10.1109/CVPR.2017.368.
Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B, Xue X Y. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. https://doi.org/10.1109/TMM.2018.2818020.
Article Google Scholar
Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R, Liang J J. EAST: An efficient and accurate scene text detector. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651. https://doi.org/10.1109/CVPR.2017.283.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.3431-3440. https://doi.org/10.1109/CVPR.2015.7298965.
Lyu P Y, Liao M H, Yao C, Wu W H, Bai X. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.71-88. https://doi.org/10.1007/978-3-030-01264-9_5.
He K M, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, October 2017, pp.2980-2988. https://doi.org/10.1109/ICCV.2017.322.
Wang W H, Xie E Z, Li X, Hou W B, Lu T, Shao S. Shape robust text detection with progressive scale expansion network. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9328-9337. https://doi.org/10.1109/CVPR.2019.00956.
Xie E Z, Zang Y H, Shao S, Yu G, Yao C, Li G Y. Scene text detection with supervised pyramid context network. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27–February 1, 2019, pp.9038-9045. https://doi.org/10.1609/aaai.v33i01.33019038.
Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.3482-3490. https://doi.org/10.1109/CVPR.2017.371.
Deng D, Liu H F, Li X L, Cai D. PixelLink: Detecting scene text via instance segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.6773-6780.
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.936-944. https://doi.org/10.1109/CVPR.2017.106.
Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.9357-9366. https://doi.org/10.1109/CVPR.2019.00959.
Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72. https://doi.org/10.1007/978-3-319-46484-8_4.
Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5/6): 602-610. https://doi.org/10.1016/j.neunet.2005.06.042.
Article Google Scholar
Lyu P Y, Yao C, Wu W H, Yan S C, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.7553-7563. https://doi.org/10.1109/CVPR.2018.00788.
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 2010 IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970. https://doi.org/10.1109/CVPR.2010.5540041.
Wu H, Zou B J, Zhao Y Q, Guo J J. Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. The Visual Computer, 2017, 33(1): 113-126. https://doi.org/10.1007/s00371-015-1156-1.
Article Google Scholar
Chen H Z, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In Proc. the 2011 IEEE International Conference on Image Processing, September 2011, pp.2609-2612. https://doi.org/10.1109/ICIP.2011.6116200.
Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767. https://doi.org/10.1016/j.imavis.2004.02.006.
Article Google Scholar
Yin X C, Yin X W, Huang K Z, Hao H W. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(5): 970-983. https://doi.org/10.1109/TPAMI.2013.182.
Article Google Scholar
Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371.
Article Google Scholar
Liao M H, Zhu Z, Shi B G, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5909-5918. https://doi.org/10.1109/CVPR.2018.00619.
Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y, Bai X. Multi-oriented text detection with fully convolutional networks. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167. https://doi.org/10.1109/CVPR.2016.451.
Long S B, Ruan J Q, Zhang W J, He X, Wu W H, Yao C. TextSnake: A flexible representation for detecting text of arbitrary shapes. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.19-35. https://doi.org/10.1007/978-3-030-01216-8_2.
Vincent L, Soille P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13(6): 583-598. https://doi.org/10.1109/34.87344.
Article Google Scholar
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90
Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2315-2324. https://doi.org/10.1109/CVPR.2016.254.
Tian S X, Pan Y F, Huang C, Lu S J, Yu K, Tan C L. Text flow: A unified text detection system in natural scene images. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.4651-4659. https://doi.org/10.1109/ICCV.2015.528.
Gers F A, Schraudolph N N, Schmidhuber J. Learning precise timing with LSTM recurrent networks. The Journal of Machine Learning Research, 2002, 3: 115-143. https://doi.org/10.1162/153244303768966139.
Article MathSciNet MATH Google Scholar
Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L G, Mestre S R, Mas J, Mota D F, Almazàn J A, Heras L P. ICDAR 2013 robust reading competition. In Proc. the 12th International Conference on Document Analysis and Recognition, August 2013, pp.1484-1493. https://doi.org/10.1109/IC-DAR.2013.221.
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S J, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. In Proc. the 13th International Conference on Document Analysis and Recognition, August 2015, pp.1156-1160. https://doi.org/10.1109/IC-DAR.2015.7333942.
Yao C, Bai X, Liu W Y, Ma Y, Tu Z W. Detecting texts of arbitrary orientations in natural images. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090. https://doi.org/10.1109/CVPR.2012.6247787.
Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z B, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J C, Liu C L, Ogier J M. ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In Proc. the 14th IAPR International Conference on International Conference on Document Analysis and Recognition, November 2017, pp.1454-1459. https://doi.org/10.1109/ICDAR.2017.237.
Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Jiang Y Y, Zhu X Y, Wang X B, Yang S L, Li W, Wang H, Fu P, Luo Z B. R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv:1706.09579, 2017. https://arxiv.org/abs/1706.09579, Apr. 2021.
He P, Huang W L, He T, Zhu Q L, Qiao Y, Li X L. Single shot text detector with regional attention. In Proc. the 2017 IEEE International Conference on Computer Vision, October 2017, pp.3066-3074. https://doi.org/10.1109/ICCV.2017.331.
Tian Z T, Shu M, Lyu P Y, Li R Y, Zhou C, Shen X Y, Jia J Y. Learning shape-aware embedding for scene text detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.4234-4243. https://doi.org/10.1109/CVPR.2019.00436.
Liao M H, Wan Z Y, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In Proc. the 34th AAAI Conference on Artificial Intelligence, February 2020, pp.11474-11481. https://doi.org/10.1609/aaai.v34i07.6812.
Liu X B, Liang D, Yan S, Chen D G, Qiao Y, Yan J J. FOTS: Fast oriented text spotting with a unified network. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.5676-5685. https://doi.org/10.1109/CVPR.2018.00595.
Zhang S X, Zhu X B, Hou J B, Liu C, Yang C, Wang H F, Yin X C. Deep relational reasoning graph network for arbitrary shape text detection. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.9696-9705. https://doi.org/10.1109/CVPR42600.2020.00972.
Zhang C Q, Liang B R, Huang Z M, En M Y, Han J Y, Ding E R, Ding X H. Look more than once: An accurate detector for text of arbitrary shapes. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.10544-10553. https://doi.org/10.1109/CVPR.2019.01080.
Li Y, Yu Y J, Li Z F, Lin Y K, Xu M F, Li J W, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv:1811.07432, 2018. https://arxiv.org/abs/1811.07432, Apr. 2021.
Huang Z D, Zhong Z Y, Sun L, Huo Q. Mask R-CNN with pyramid attention network for scene text detection. In Proc. the 2019 IEEE Winter Conference on Applications of Computer Vision, January 2019, pp.764-772. https://doi.org/10.1109/WACV.2019.00086.
He W H, Zhang X Y, Yin F, Liu C L. Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 2018, 27(11): 5406-5419. https://doi.org/10.1109/TIP.2018.2855399.
Article MathSciNet Google Scholar
Xue C H, Lu S J, Zhan F N. Accurate scene text detection through border semantics awareness and bootstrapping. In Proc. the 15th European Conference on Computer Vision, September 2018, pp.370-387. https://doi.org/10.1007/978-3-030-01270-0_22.

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Central South University, Changsha, 410083, China
Wen-Jun Yang, Bei-Ji Zou, Kai-Wen Li & Shu Liu
Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha, 410083, China
Wen-Jun Yang, Bei-Ji Zou, Kai-Wen Li & Shu Liu

Authors

Wen-Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bei-Ji Zou
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Wen Li
View author publications
You can also search for this author in PubMed Google Scholar
Shu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shu Liu.

Supplementary Information

ESM 1

(PDF 174 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, WJ., Zou, BJ., Li, KW. et al. A Character Flow Framework for Multi-Oriented Scene Text Detection. J. Comput. Sci. Technol. 36, 465–477 (2021). https://doi.org/10.1007/s11390-021-1362-4

Download citation

Received: 08 February 2021
Accepted: 28 April 2021
Published: 31 May 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11390-021-1362-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Character Flow Framework for Multi-Oriented Scene Text Detection

Abstract

Access this article

Similar content being viewed by others

Character Flow Detection and Rectification for Scene Text Spotting

S5TR: Simple Single Stage Sequencer for Scene Text Recognition

Hierarchical Text Detection: From Word Level to Character Level

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Character Flow Framework for Multi-Oriented Scene Text Detection

Abstract

Access this article

Similar content being viewed by others

Character Flow Detection and Rectification for Scene Text Spotting

S5TR: Simple Single Stage Sequencer for Scene Text Recognition

Hierarchical Text Detection: From Word Level to Character Level

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation