Semi-supervised deep learning approach to break common CAPTCHAs

Bostik, Ondrej; Horak, Karel; Kratochvila, Lukas; Zemcik, Tomas; Bilik, Simon

doi:10.1007/s00521-021-05957-0

Semi-supervised deep learning approach to break common CAPTCHAs

Original Article
Published: 12 April 2021

Volume 33, pages 13333–13343, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

440 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Manual data annotation is a time consuming activity. A novel strategy for automatic training of the CAPTCHA breaking system with no manual dataset creation is presented in this paper. We demonstrate the feasibility of the attack against a text-based CAPTCHA scheme utilizing similar network infrastructure used for Denial of Service attacks. The main goal of our research is to present a possible vulnerability in CAPTCHA systems when combining the brute-force attack with transfer learning. The classification step utilizes a simple convolutional neural network with 15 layers. Training stage uses automatically prepared dataset created without any human intervention and transfer learning for fine-tuning the deep neural network classifier. The designed system for breaking text-based CAPTCHAs achieved 80% classification accuracy after 6 fine-tuning steps for a 5 digit text-based CAPTCHA system. The results presented in this paper suggest, that even the simple attack with a large number of attacking computers can be an effective alternative to current CAPTCHA breaking systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on Captcha Recognition Using Deep Learning

Spiral CAPTCHA with Adversarial Perturbation and Its Security Analysis with Convolutional Neural Network

Breaking Text-Based CAPTCHA with Sparse Convolutional Neural Networks

References

von Ahn L, Blum M, Hopper NJ, Langford J (2003) CAPTCHA: Using Hard AI Problems for Security. Lecture Notes in Computer Science. Springer, Berlin, pp 294–311. https://doi.org/10.1007/3-540-39200-9_18
Chapter Google Scholar
Arai T, Okabe Y, Matsumoto Y (2021) Precursory analysis of attack-log time series by machine learning for detecting bots in CAPTCHA. In: 2021 International Conference on information networking (ICOIN), pp. 295–300 https://doi.org/10.1109/ICOIN50884.2021.9333881
Arai T, Okabe Y, Matsumoto Y, Kawamura, K (2020) Detection of Bots in CAPTCHA as a cloud service utilizing machine learning. In: 2020 International conference on information networking (ICOIN), pp. 584–589 https://doi.org/10.1109/ICOIN48656.2020.9016522
Athanasopoulos E, Antonatos S (2006) Enhanced CAPTCHAs: using animation to tell humans and computers apart. Ifip Int Federation Information Process 4237:97–108. https://doi.org/10.1007/11909033_9
Article Google Scholar
Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: generic solving of text-based CAPTCHAs http://portal.acm.org/citation.cfm?id=2671296
Bursztein E, Beauxis R, Paskov H, Perito D, Fabry C, Mitchell J (2011)The failure of noise-based non-continuous audio captchas. In: Proceedings - IEEE symposium on security and privacy, pp. 19–31 https://doi.org/10.1109/SP.2011.14
Bursztein E, Bethard S (2009) Decaptcha: breaking 75% of eBay audio CAPTCHAs. Proceedings of the 3rd USENIX conference on Offensive technologies 1(8), 1–7
Bursztein E, Bethard S, Fabry C, Mitchell JC, Jurafsky D (2010) How good are humans at solving CAPTCHAs? a large scale evaluation. In: Proceedings - IEEE symposium on security and privacy. pp. 399–413. IEEE . https://doi.org/10.1109/SP.2010.31. http://ieeexplore.ieee.org/document/5504799/
Bursztein E, Martin M, Mitchell JC (2011) Text-based CAPTCHA strengths and weaknesses. In: proceedings of the ACM conference on computer and communications security, pp. 125–138 . https://doi.org/10.1145/2046707.2046724
Bursztein E, Moscicki A, Fabry C, Bethard S, Mitchell JC, Jurafasky D (2014) Easy Does It: more usable CAPTCHAs. In: CHI ’14 proceedings of the SIGCHI conference on human factors in computing systems. pp. 2637–2646. 1600 Amphitheatre Pkwy https://www.elie.net/publication/easy-does-it-more-usable-captchas
Chellapilla K, Larson K, Simard P, Czerwinski M (2005) Computers beat humans at single character recognition in reading based human interaction proofs (HIPs). In: 2nd Conference on Email and Anti-Spam, pp. 1–8. Conference on Email and Anti-Spam, CEAS
Chellapilla K, Simard P (2005) Using machine learning to break visual human interaction proofs (HIPs). In: Saul L, Weiss Y, Bottou L (eds.) Advances in neural information processing systems, vol 17. MIT Press, Vancouver, pp 265–272. https://proceedings.neurips.cc/paper/2004/file/283085d30e10513624c8cece7993f4de-Paper.pdf
Chow YW, Susilo W (2011) AniCAP: An Animated 3D CAPTCHA scheme based on motion parallax. In: D. Lin, G. Tsudik, X. Wang (eds.) Cryptology and network security: 10th International conference, CANS 2011, Sanya, China, December 10-12, 2011. Proceedings, pp. 255–271. Springer Berlin Heidelberg, Berlin, Heidelberg . https://doi.org/10.1007/978-3-642-25513-7_18
Desai A, Patadia P (2009) Drag and drop: a better approach to CAPTCHA. In: 2009 Annual IEEE India Conference, pp. 1–4 . https://doi.org/10.1109/INDCON.2009.5409359
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504
Article MathSciNet MATH Google Scholar
Gao H, Tang M, Liu Y, Zhang P, Liu X (2017) Research on the security of microsoft’s two-layer captcha. IEEE Transactions Information Forensics Secur 12(7):1671–1685. https://doi.org/10.1109/TIFS.2017.2682704
Article Google Scholar
Gao H, Wang W, Qi J, Wang X, Liu X, Yan J (2013) The robustness of hollow CAPTCHAs. In: Proceedings of the ACM conference on computer and communications security. pp 1075–1086 . https://doi.org/10.1145/2508859.2516732
Gao H, Yan J, Cao F, Zhang Z, Lei L, Tang M, Zhang P, Zhou X, Wang X, Li J (2016) A simple generic attack on text captchas. In: Network and distributed system security symposium (NDSS 2016), pp. 1–26. https://doi.org/10.14722/ndss.2016.23154
Horak K, Sablatnig R (2019) Deep learning concepts and datasets for image recognition: overview 2019. In: Eleventh international conference on digital image processing (ICDIP 2019), 11179, pp 484–491. SPIE . https://doi.org/10.1117/12.2539806
Kaur K, Behal S (2015) Designing a secure text-based CAPTCHA. Procedia Comput Sci 57:122–125. https://doi.org/10.1016/j.procs.2015.07.381
Article Google Scholar
Kiselak J, Lu Y, Svihra J, Szepe P, Stehlik M (2021) “SPOCU”: scaled polynomial constant unit activation function. Neural Comput Appl 33:3385–3401
Kisel’ák J, Lu Y, Švihra J, Szépe P, Stehlík M (2020) Correction to: SPOCU: scaled polynomial constant unit activation function. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05412-6
Article Google Scholar
Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-Normalizing neural networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in neural information processing systems 30 (NIPS 2017), vol 30. Curran Associates, Inc., pp 971–980. https://proceedings.neurips.cc/paper/2017/file/5d44ee6f2c3f71b73125876103c8f6c4-Paper.pdf
Mori G, Malik J (2003) Recognizing objects in adversarial clutter: breaking a visual CAPTCHA. 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings. 1, I–I
Murphy KP (2012) Machine learning: a probabilistic perspective, 1, edition. The MIT Press, Cambridge, MA
Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines Vinod Nair. Proceedings of ICML 27:807–814
Nguyen VD, Chow YW, Susilo W (2014) A CAPTCHA scheme based on the identification of character locations. In: X. Huang, J. Zhou (eds.) Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 8434 LNCS, pp. 60–74. Springer, Cham . https://doi.org/10.1007/978-3-319-06320-1_6
Noury Z, Rezaei M (2020) Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability assessment . arXiv:2006.08296
Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V (2015) Breaking text-based CAPTCHAs with variable word and character orientation. Pattern Recognit. https://doi.org/10.1016/j.patcog.2014.09.006
Article Google Scholar
Tang M, Gao H, Zhang Y, Liu Y, Zhang P, Wang P (2018) Research on deep learning techniques in breaking text-based captchas and designing image-based captcha. IEEE Transactions Information Forensics Secur 13(10):2522–2537. https://doi.org/10.1109/TIFS.2018.2821096
Article Google Scholar
Wang P, Gao H, Shi Z, Yuan Z, Hu J (2020) Simple and easy: transfer learning-based attacks to text CAPTCHA. IEEE Access 8:59044–59058. https://doi.org/10.1109/ACCESS.2020.2982945
Article Google Scholar
Yan J, Ahmad ASE (2008) Breaking visual CAPTCHAs with naive pattern recognition algorithms. In: Twenty-Third annual computer security applications conference (ACSAC 2007), pp. 279–297 . https://doi.org/10.1109/acsac.2007.4412996
Yang H (2020) GitHub - lepture/captcha: A CAPTCHA library that generates audio and image CAPTCHAs. . https://github.com/lepture/captcha/
Ye G, Tang Z, Fang D, Zhu Z, Feng Y, Xu P, Chen X, Wang Z (2018) Yet another text captcha solver: a generative adversarial network based approach. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, CCS ’18, pp. 332–348. Association for computing machinery, New York, NY, USA . https://doi.org/10.1145/3243734.3243754
Zhang N, Ebrahimi M, Li W, Chen H (2020) A generative adversarial learning framework for breaking text-based CAPTCHA in the dark web. In: 2020 IEEE International conference on intelligence and security informatics (ISI), pp. 1–6 . https://doi.org/10.1109/ISI49825.2020.9280537
Zi Y, Gao H, Cheng Z, Liu Y (2020) An end-to-end attack on text CAPTCHAs. IEEE Transactions Information Forensics Secur 15:753–766. https://doi.org/10.1109/TIFS.2019.2928622
Article Google Scholar

Download references

Acknowledgements

The completion of this paper was made possible by the grant No. FEKT-S-20-6205 - “Research in Automation, Cybernetics and Artificial Intelligence within Industry 4.0” financially supported by the Internal science fund of Brno University of Technology.

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, Brno, 61200, Czech Republic
Ondrej Bostik, Karel Horak, Lukas Kratochvila, Tomas Zemcik & Simon Bilik

Authors

Ondrej Bostik
View author publications
You can also search for this author in PubMed Google Scholar
Karel Horak
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Kratochvila
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Zemcik
View author publications
You can also search for this author in PubMed Google Scholar
Simon Bilik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ondrej Bostik.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bostik, O., Horak, K., Kratochvila, L. et al. Semi-supervised deep learning approach to break common CAPTCHAs. Neural Comput & Applic 33, 13333–13343 (2021). https://doi.org/10.1007/s00521-021-05957-0

Download citation

Received: 20 August 2020
Accepted: 25 March 2021
Published: 12 April 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00521-021-05957-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised deep learning approach to break common CAPTCHAs

Abstract

Access this article

Similar content being viewed by others

Survey on Captcha Recognition Using Deep Learning

Spiral CAPTCHA with Adversarial Perturbation and Its Security Analysis with Convolutional Neural Network

Breaking Text-Based CAPTCHA with Sparse Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-supervised deep learning approach to break common CAPTCHAs

Abstract

Access this article

Similar content being viewed by others

Survey on Captcha Recognition Using Deep Learning

Spiral CAPTCHA with Adversarial Perturbation and Its Security Analysis with Convolutional Neural Network

Breaking Text-Based CAPTCHA with Sparse Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation