Efficient Decision-based Black-box Patch Attacks on Video Recognition,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Decision-based Black-box Patch Attacks on Video Recognition
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2023-03-21 , DOI: arxiv-2303.11917
Kaixun Jiang, Zhaoyu Chen, Tony Huang, Jiafeng Wang, Dingkang Yang, Bo Li, Yan Wang, Wenqiang Zhang

Although Deep Neural Networks (DNNs) have demonstrated excellent performance, they are vulnerable to adversarial patches that introduce perceptible and localized perturbations to the input. Generating adversarial patches on images has received much attention, while adversarial patches on videos have not been well investigated. Further, decision-based attacks, where attackers only access the predicted hard labels by querying threat models, have not been well explored on video models either, even if they are practical in real-world video recognition scenes. The absence of such studies leads to a huge gap in the robustness assessment for video models. To bridge this gap, this work first explores decision-based patch attacks on video models. We analyze that the huge parameter space brought by videos and the minimal information returned by decision-based models both greatly increase the attack difficulty and query burden. To achieve a query-efficient attack, we propose a spatial-temporal differential evolution (STDE) framework. First, STDE introduces target videos as patch textures and only adds patches on keyframes that are adaptively selected by temporal difference. Second, STDE takes minimizing the patch area as the optimization objective and adopts spatialtemporal mutation and crossover to search for the global optimum without falling into the local optimum. Experiments show STDE has demonstrated state-of-the-art performance in terms of threat, efficiency and imperceptibility. Hence, STDE has the potential to be a powerful tool for evaluating the robustness of video recognition models.

中文翻译：

针对视频识别的高效基于决策的黑盒补丁攻击

尽管深度神经网络 (DNN) 已展示出出色的性能，但它们容易受到对抗性补丁的攻击，这些补丁会给输入带来可感知和局部的扰动。在图像上生成对抗性补丁受到了很多关注，而视频上的对抗性补丁尚未得到很好的研究。此外，基于决策的攻击（攻击者只能通过查询威胁模型来访问预测的硬标签）在视频模型上也没有得到很好的探索，即使它们在现实世界的视频识别场景中是实用的。缺乏此类研究导致视频模型的鲁棒性评估存在巨大差距。为了弥合这一差距，这项工作首先探索了对视频模型的基于决策的补丁攻击。我们分析，视频带来的巨大参数空间和基于决策的模型返回的信息极少，都大大增加了攻击难度和查询负担。为了实现高效查询攻击，我们提出了时空差分进化 (STDE) 框架。首先，STDE 将目标视频作为补丁纹理引入，并且只在通过时间差异自适应选择的关键帧上添加补丁。其次，STDE以最小化斑块面积为优化目标，采用时空变异和交叉搜索全局最优而不陷入局部最优。实验表明，STDE 在威胁、效率和不可感知性方面展示了最先进的性能。因此，

更新日期：2023-03-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>