Skip to main content
Log in

A Combination of DWT CLAHE and Wiener Filter for Effective Scene to Text Conversion and Pronunciation

  • Original Article
  • Published:
Journal of Electrical Engineering & Technology Aims and scope Submit manuscript

Abstract

An effective scene to text conversion and its pronunciation is realized. An intelligent combination of Discrete Wavelet Transform (DWT), Contrast Limited Adaptive Histogram Equalization (CLAHE), Wiener filter and adaptive weighted average is utilized for the image enhancement. Subsequently, the Maximally Stable Extremal Region (MSER) is used to detect the text regions. Afterward, the geometrical and contour based approaches filter out the non-text MSERs. The connected component concept is used to group the text candidates. In next step the Optical Character Recognition (OCR) recognizes the text. The Microsoft speech to text synthesizer pronounces the extracted text. The system applicability is tested by using the standard robust reading competition dataset. The designed method secures 93% precision in text segmentation and 89.9% precision in end-to-end recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Rathi A, Nikalje AV (2019) Review on portable camera based assistive text and label reading for blind persons. Int Res J Eng Technol (IRJET) 6(12):879–882

    Google Scholar 

  2. Khan Z, Braich PS, Rahim K, Rayat JS, Xing L, Iqbal M et al (2016) Burden and depression among caregivers of visually impaired patients in a Canadian population. Adv Med. https://doi.org/10.1155/2016/4683427

    Article  Google Scholar 

  3. Coughlan J, Manduchi R (2018) Camera-based access to visual information. Assistive technology for blindness and low vision. CRC Press, USA, pp 237–264

    Google Scholar 

  4. Deshpande S, Shriram R (2016) Real time text detection and recognition on hand held objects to assist blind people. In: 2016 International conference on automatic control and dynamic optimization techniques (ICACDOT). IEEE, pp. 1020–1024

  5. Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE, pp. 97–104

  6. Weinman JJ, Butler Z, Knoll D, Feild J (2014) Toward integrated scene text reading. IEEE Trans Pattern Anal Mach Intell 36(2):375–387

    Article  Google Scholar 

  7. Satyanarayana P, Sujitha K, Kiron VSA, Reddy PA, Ganesh M (2018) Assistance vision for blind people using k-NN algorithm and Raspberry Pi. In: Proceedings of 2nd international conference on micro-electronics, electromagnetics and telecommunications, Springer, Singapore, pp. 113–122

  8. Kalbhor MKM, SD MK (2017) A survey on portable camera-based assistive text and product label reading from hand-held objects for blind persons. Int Res J Eng Technol (IRJET) 4(3):55–57

    Google Scholar 

  9. Laksmi TV, Madhu T, Kavya K, Basha SE (2016) Novel image enhancement technique using CLAHE and wavelet transforms. Int J Sci Eng Technol 5(11):507–511

    Google Scholar 

  10. Ramaraj M, Raghavan S, Khan WA (2013) Homomorphic filtering techniques for WCE image enhancement. In: 2013 IEEE international conference on computational intelligence and computing research. IEEE, pp 1–5

  11. Makandar A, Halalli B (2016) Pre-processing of mammography image for early detection of breast cancer. Int J Comput Appl 144(3):0975–8887

    Google Scholar 

  12. Šarić M (2017) Scene text segmentation using low variation extremal regions and sorting based character grouping. Neurocomputing 266:56–65

    Article  Google Scholar 

  13. Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia 15(7):1553–1568

    Article  Google Scholar 

  14. Yin XC, Yin X, Huang K, Hao HW (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Google Scholar 

  15. Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision. Springer, Cham, pp. 497–511

  16. Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE international conference on computer vision, pp. 4651–4659

  17. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: Thirty-first AAAI conference on artificial intelligence

  18. Këpuska V, Bohouta G (2017) Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int J Eng Res Appl 7(03):20–24

    Google Scholar 

  19. Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In 2011 international conference on document analysis and recognition. IEEE, pp. 1491–1496

  20. Lidong H, Wei Z, Jun W, Zebin S (2015) Combination of contrast limited adaptive histogram equalisation and discrete wavelet transform for image enhancement. IET Image Proc 9(10):908–915

    Article  Google Scholar 

  21. Shi C, Wang C, Xiao B, Zhang Y, Gao S (2013) Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recogn Lett 34(2):107–116

    Article  Google Scholar 

  22. Qaisar SM, Khan R, Hammad N (2019) Scene to text conversion and pronunciation for visually impaired people. In: 2019 Advances in science and engineering technology international conferences (ASET), IEEE, pp. 1–4

  23. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, et al (2015) ICDAR 2015 competition on robust reading. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp. 1156–1160

  24. Xie X, Yue D, Peng C (2020) Observer design of discrete-time fuzzy systems based on an alterable weights method. IEEE Trans Cybern 50(4):1430–1439. https://doi.org/10.1109/TCYB.2018.2878419

    Article  Google Scholar 

Download references

Funding

This project is funded by the Effat University, Jeddah, Saudi Arabia under the grant number UC#9/29 April.2020/7.1-22(2)1. Authors are thankful to anonymous reviewers for their useful feedback

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed Mian Qaisar.

Ethics declarations

Conflict of interest

Authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mian Qaisar, S., Hammad, N. & Khan, R. A Combination of DWT CLAHE and Wiener Filter for Effective Scene to Text Conversion and Pronunciation. J. Electr. Eng. Technol. 15, 1829–1836 (2020). https://doi.org/10.1007/s42835-020-00461-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42835-020-00461-2

Keywords

Navigation