A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques,Archives of Computational Methods in Engineering

当前位置： X-MOL 学术 › Arch. Computat. Methods Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques
Archives of Computational Methods in Engineering ( IF 9.7 ) Pub Date : 2021-06-14 , DOI: 10.1007/s11831-021-09608-4
Mohinder Kumar , M. K. Jindal , Munish Kumar

CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers and Human Apart. CAPTCHA is used for internet security. A few CAPTCHA schemes are available today like, text-based, audio-based, video/animation-based, puzzle based etc. In this paper, all these types are collaborating at single place to analyze. The main aim of this article is to present a literature to identify and recognize CAPTCHA, its types, the creation and breaking techniques. It is a systematic and complete analysis of all available CAPTCHA types. In this paper, 16 text-based CAPTCHA’s generation methods are discussed with usability and security ranges from 3 to 100 and 65 to 100%, respectively. The security and usability measures are not calculated/sustained using some known English schemes. Out of 16 reviewed CAPTCHAs, 12 are based on English language, 1 on Arabic language, 1 on Chinese language, 1 on Devanagari language and 1 on Gurumukhi script. The designs are made segment proof with overlapping random shapes, overlapping characters, clasping, different colors and different shades. For making recognition proof many techniques are used like image masking, local and global warping; broken characters, random rotation, arcs, jaws, etc. Approximately 50 schemes, especially based on the English language, are successfully broken with a success rate that ranges from 2 to 100%. The techniques that are used to break these schemes include shape context matching, distortion estimation, Log Gabor 2D filter, horizontal and vertical projection (for a segment the letters) are used. For recognition CNN, KNN, DNN and MCDNN are used. Almost 15 images-based CAPTCHAs are discussed that are designed with usability and security range 90–100 and 17–100%, respectively. Out of these 5 schemes are successfully broken with a success rate ranging between 7 and 100%. The K-NN and SVM are mostly used algorithms to recognize the images. Audio based CAPTCHAs (5 designs) are discussed with usability and security range from 68.5 to 100 and 100%, respectively. The broken rate of these audio schemes is also 45–75%. These schemes are broken with SVM and K-NN algorithms. The paper also discusses 4 popular video-based designs that provide usability and security that ranges from 75 to 100 and 98 to 100, respectively. These schemes are also compromised with broken rate 16–10% using SIFT, NN and simple OCR techniques. The paper can be a benchmark to precede any specific research to dive into any one of these types.

中文翻译：

CAPTCHA 识别系统综述：类型、创建和破解技术

CAPTCHA 代表完全自动化的公共图灵测试，以区分计算机和人类。CAPTCHA 用于互联网安全。目前有几种 CAPTCHA 方案可用，例如基于文本、基于音频、基于视频/动画、基于拼图等。在本文中，所有这些类型都在一个地方协作进行分析。本文的主要目的是介绍识别和识别 CAPTCHA、其类型、创建和破解技术的文献。它是对所有可用 CAPTCHA 类型的系统和完整的分析。本文讨论了 16 种基于文本的 CAPTCHA 生成方法，可用性和安全性范围分别为 3% 到 100% 和 65% 到 100%。安全性和可用性措施不是使用一些已知的英语方案计算/维持的。在 16 个经过审核的 CAPTCHA 中，12 个基于英语，阿拉伯语 1 个，中文 1 个，梵文 1 个，古鲁穆基文字 1 个。这些设计是由重叠的随机形状、重叠的字符、扣合、不同的颜色和不同的色调制成的。为了进行识别证明，使用了许多技术，如图像屏蔽、局部和全局变形；断字、随机旋转、圆弧、下颌等。大约50个方案，特别是基于英文的方案，成功破解，成功率从2%到100%不等。用于打破这些方案的技术包括形状上下文匹配、失真估计、Log Gabor 2D 过滤器、水平和垂直投影（对于一个片段使用字母）。对于识别 CNN，使用 KNN、DNN 和 MCDNN。讨论了将近 15 个基于图像的 CAPTCHA，它们的可用性和安全性范围分别为 90-100% 和 17-100%。在这 5 个方案中，成功破解的成功率在 7% 到 100% 之间。K-NN 和 SVM 是最常用的算法来识别图像。讨论了基于音频的 CAPTCHA（5 种设计），可用性和安全性范围分别为 68.5 到 100% 和 100%。这些音频方案的损坏率也是 45-75%。这些方案被 SVM 和 K-NN 算法打破了。该论文还讨论了 4 种流行的基于视频的设计，它们分别提供了 75 到 100 和 98 到 100 的可用性和安全性。使用 SIFT、NN 和简单的 OCR 技术，这些方案也受到了 16-10% 的损坏率的影响。

更新日期：2021-06-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11