Exploring Deep Learning for Joint Audio-Visual Lip Biometrics,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
arXiv - CS - Multimedia Pub Date : 2021-04-17 , DOI: arxiv-2104.08510
Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang

Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication. Previous works have demonstrated the usefulness of AV lip biometrics. However, the lack of a sizeable AV database hinders the exploration of deep-learning-based audio-visual lip biometrics. To address this problem, we compile a moderate-size database using existing public databases. Meanwhile, we establish the DeepLip AV lip biometrics system realized with a convolutional neural network (CNN) based video module, a time-delay neural network (TDNN) based audio module, and a multimodal fusion module. Our experiments show that DeepLip outperforms traditional speaker recognition models in context modeling and achieves over 50% relative improvements compared with our best single modality baseline, with an equal error rate of 0.75% and 1.11% on the test datasets, respectively.

中文翻译：

探索用于联合视听嘴唇生物识别技术的深度学习

视听（AV）嘴唇生物特征识别技术是一种很有前途的身份验证技术，它利用了语音通信中视听模态的优势。先前的工作已经证明了AV唇生物识别技术的有用性。但是，由于缺少足够的AV数据库，因此无法进行基于深度学习的视听嘴唇生物识别技术的探索。为了解决这个问题，我们使用现有的公共数据库来编译一个中等大小的数据库。同时，我们建立了DeepLip AV唇生物特征识别系统，该系统通过基于卷积神经网络（CNN）的视频模块，基于时延神经网络（TDNN）的音频模块和多模式融合模块实现。

更新日期：2021-04-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文