当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Interference-Resistant and Low-Consumption Lip Recognition Method
Electronics ( IF 2.6 ) Pub Date : 2022-09-26 , DOI: 10.3390/electronics11193066
Junwei Jia , Zhilu Wang , Lianghui Xu , Jiajia Dai , Mingyi Gu , Jing Huang

Lip movements contain essential linguistic information. It is an important medium for studying the content of the dialogue. At present, there are many studies on how to improve the accuracy of lip language recognition models. However, there are few studies on the robustness and generalization performance of the model under various disturbances. Specific experiments show that the current state-of-the-art lip recognition model significantly drops in accuracy when disturbed and is particularly sensitive to adversarial examples. This paper substantially alleviates this problem by using Mixup training. Taking the model subjected to negative attacks generated by FGSM as an example, the model in this paper achieves 85.0% and 40.2% accuracy on the English dataset LRW and the Mandarin dataset LRW-1000, respectively. The correct recognition rates are improved by 9.8% and 8.3%, compared with the current advanced lip recognition models. The positive impact of Mixup training on the robustness and generalization of lip recognition models is demonstrated. In addition, the performance of the lip recognition classification model depends more on the training parameters, which increase the computational cost. The InvNet-18 network in this paper reduces the consumption of GPU resources and the training time while improving the model accuracy. Compared with the standard ResNet-18 network used in mainstream lip recognition models, the InvNet-18 network in this paper has more than three times lower GPU consumption and 32% fewer parameters. After detailed analysis and comparison in various aspects, it is demonstrated that the model in this paper can effectively improve the model’s anti-interference ability and reduce training resource consumption. At the same time, the accuracy is comparable with the current state-of-the-art results.

中文翻译:

一种抗干扰、低消耗的唇形识别方法

嘴唇运动包含基本的语言信息。它是研究对话内容的重要媒介。目前,关于如何提高唇语识别模型准确率的研究较多。然而,关于模型在各种干扰下的鲁棒性和泛化性能的研究很少。具体实验表明,当前最先进的唇形识别模型在受到干扰时会显着降低准确率,并且对对抗性示例特别敏感。本文通过使用 Mixup 训练大大缓解了这个问题。以 FGSM 产生的负攻击模型为例,本文模型在英文数据集 LRW 和普通话数据集 LRW-1000 上分别达到了 85.0% 和 40.2% 的准确率。与当前先进的唇形识别模型相比,正确识别率分别提高了 9.8% 和 8.3%。证明了 Mixup 训练对嘴唇识别模型的鲁棒性和泛化性的积极影响。此外,唇形识别分类模型的性能更多地依赖于训练参数,增加了计算成本。本文的 InvNet-18 网络在提高模型精度的同时,减少了 GPU 资源的消耗和训练时间。与主流唇形识别模型中使用的标准 ResNet-18 网络相比,本文的 InvNet-18 网络的 GPU 消耗降低了 3 倍以上,参数减少了 32%。经过多方面的详细分析比较,证明本文模型能有效提高模型的抗干扰能力,降低训练资源消耗。同时,准确性与当前最先进的结果相当。
更新日期:2022-09-26
down
wechat
bug