Abstract
An effective scene to text conversion and its pronunciation is realized. An intelligent combination of Discrete Wavelet Transform (DWT), Contrast Limited Adaptive Histogram Equalization (CLAHE), Wiener filter and adaptive weighted average is utilized for the image enhancement. Subsequently, the Maximally Stable Extremal Region (MSER) is used to detect the text regions. Afterward, the geometrical and contour based approaches filter out the non-text MSERs. The connected component concept is used to group the text candidates. In next step the Optical Character Recognition (OCR) recognizes the text. The Microsoft speech to text synthesizer pronounces the extracted text. The system applicability is tested by using the standard robust reading competition dataset. The designed method secures 93% precision in text segmentation and 89.9% precision in end-to-end recognition.
Similar content being viewed by others
References
Rathi A, Nikalje AV (2019) Review on portable camera based assistive text and label reading for blind persons. Int Res J Eng Technol (IRJET) 6(12):879–882
Khan Z, Braich PS, Rahim K, Rayat JS, Xing L, Iqbal M et al (2016) Burden and depression among caregivers of visually impaired patients in a Canadian population. Adv Med. https://doi.org/10.1155/2016/4683427
Coughlan J, Manduchi R (2018) Camera-based access to visual information. Assistive technology for blindness and low vision. CRC Press, USA, pp 237–264
Deshpande S, Shriram R (2016) Real time text detection and recognition on hand held objects to assist blind people. In: 2016 International conference on automatic control and dynamic optimization techniques (ICACDOT). IEEE, pp. 1020–1024
Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE, pp. 97–104
Weinman JJ, Butler Z, Knoll D, Feild J (2014) Toward integrated scene text reading. IEEE Trans Pattern Anal Mach Intell 36(2):375–387
Satyanarayana P, Sujitha K, Kiron VSA, Reddy PA, Ganesh M (2018) Assistance vision for blind people using k-NN algorithm and Raspberry Pi. In: Proceedings of 2nd international conference on micro-electronics, electromagnetics and telecommunications, Springer, Singapore, pp. 113–122
Kalbhor MKM, SD MK (2017) A survey on portable camera-based assistive text and product label reading from hand-held objects for blind persons. Int Res J Eng Technol (IRJET) 4(3):55–57
Laksmi TV, Madhu T, Kavya K, Basha SE (2016) Novel image enhancement technique using CLAHE and wavelet transforms. Int J Sci Eng Technol 5(11):507–511
Ramaraj M, Raghavan S, Khan WA (2013) Homomorphic filtering techniques for WCE image enhancement. In: 2013 IEEE international conference on computational intelligence and computing research. IEEE, pp 1–5
Makandar A, Halalli B (2016) Pre-processing of mammography image for early detection of breast cancer. Int J Comput Appl 144(3):0975–8887
Šarić M (2017) Scene text segmentation using low variation extremal regions and sorting based character grouping. Neurocomputing 266:56–65
Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia 15(7):1553–1568
Yin XC, Yin X, Huang K, Hao HW (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision. Springer, Cham, pp. 497–511
Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE international conference on computer vision, pp. 4651–4659
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: Thirty-first AAAI conference on artificial intelligence
Këpuska V, Bohouta G (2017) Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int J Eng Res Appl 7(03):20–24
Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In 2011 international conference on document analysis and recognition. IEEE, pp. 1491–1496
Lidong H, Wei Z, Jun W, Zebin S (2015) Combination of contrast limited adaptive histogram equalisation and discrete wavelet transform for image enhancement. IET Image Proc 9(10):908–915
Shi C, Wang C, Xiao B, Zhang Y, Gao S (2013) Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recogn Lett 34(2):107–116
Qaisar SM, Khan R, Hammad N (2019) Scene to text conversion and pronunciation for visually impaired people. In: 2019 Advances in science and engineering technology international conferences (ASET), IEEE, pp. 1–4
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, et al (2015) ICDAR 2015 competition on robust reading. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp. 1156–1160
Xie X, Yue D, Peng C (2020) Observer design of discrete-time fuzzy systems based on an alterable weights method. IEEE Trans Cybern 50(4):1430–1439. https://doi.org/10.1109/TCYB.2018.2878419
Funding
This project is funded by the Effat University, Jeddah, Saudi Arabia under the grant number UC#9/29 April.2020/7.1-22(2)1. Authors are thankful to anonymous reviewers for their useful feedback
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mian Qaisar, S., Hammad, N. & Khan, R. A Combination of DWT CLAHE and Wiener Filter for Effective Scene to Text Conversion and Pronunciation. J. Electr. Eng. Technol. 15, 1829–1836 (2020). https://doi.org/10.1007/s42835-020-00461-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42835-020-00461-2