The Best Protection is Attack: Fooling Scene Text Recognition With Minimal Pixels,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Best Protection is Attack: Fooling Scene Text Recognition With Minimal Pixels
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2-16-2023 , DOI: 10.1109/tifs.2023.3245984
Yikun Xu ₁ , Pengwen Dai ₂ , Zekun Li ₃ , Hongjun Wang ₄ , Xiaochun Cao ₂

Affiliation

Scene text recognition (STR) has witnessed tremendous progress in the era of deep learning, but it also raises concerns about privacy infringement as scene texts usually contain valuable or sensitive information. Previous works in privacy protection of scene texts mainly focus on masking out the texts from the image/video. In this work, we learn from the idea of adversarial examples and use minimal pixel perturbation to protect the privacy of text information. Although there are well-established attacking methods on non-sequential vision tasks (e.g., classification), the attack on sequential tasks (e.g., scene text recognition) has not received sufficient attention yet. Moreover, existing works mainly focus on the white-box setting, which requires complete knowledge of the target model (e.g., architecture, parameters, or gradients). These requirements limit the scope of applications for the white-box adversarial attack. Therefore, we propose a novel black-box attacking approach for the STR models, only requiring prior knowledge of the model output. Besides, instead of disturbing most pixels as in existing STR attack methods, our proposed approach only manipulates a few pixels, meaning the perturbation is more inconspicuous. To determine the location and value of the manipulated pixels, we also provide an efficient Adaptive-Discrete Differential Evolution (AD 2E^{2}\text{E} ) by narrowing down the continuous searching space to a discrete space. It can greatly reduce the queries to the target model. Experiments on several real-world benchmarks show the effectiveness of our proposed approach. Especially, when attacking the commercial STR engine, Baidu-OCR, our method achieves higher attack success rates by a large margin than existing approaches. Our work establishes an important step towards using the black-box adversarial attack with minimal pixels to protect the privacy of text information from being easily obtained by STR models.

中文翻译：

最好的保护就是攻击：用最小的像素欺骗场景文本识别

场景文本识别（STR）在深度学习时代取得了巨大进步，但也引发了人们对隐私侵犯的担忧，因为场景文本通常包含有价值或敏感信息。先前在场景文本隐私保护方面的工作主要集中在从图像/视频中屏蔽文本。在这项工作中，我们学习了对抗性示例的思想，并使用最小的像素扰动来保护文本信息的隐私。尽管针对非序列视觉任务（例如分类）已有成熟的攻击方法，但针对序列任务（例如场景文本识别）的攻击尚未受到足够的重视。此外，现有的工作主要集中在白盒设置上，这需要对目标模型的完整了解（例如架构、参数或梯度）。这些要求限制了白盒对抗攻击的应用范围。因此，我们提出了一种针对 STR 模型的新型黑盒攻击方法，只需要先了解模型输出。此外，我们提出的方法不是像现有的 STR 攻击方法那样干扰大多数像素，而是只操纵几个像素，这意味着扰动更加不明显。为了确定被操作像素的位置和值，我们还通过将连续搜索空间缩小到离散空间来提供有效的自适应离散差分进化（AD 2E^{2}\text{E}）。可以大大减少对目标模型的查询。对几个现实世界基准的实验表明了我们提出的方法的有效性。特别是，在攻击商业STR引擎Baidu-OCR时，我们的方法比现有方法大幅提高了攻击成功率。我们的工作朝着使用最小像素的黑盒对抗攻击来保护文本信息的隐私不被 STR 模型轻易获取迈出了重要的一步。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11