当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2021-12-20 , DOI: 10.1186/s13636-021-00234-3
Jiacheng Yao 1, 2 , Jing Zhang 1, 2 , Jiafeng Li 1, 2 , Li Zhuo 1, 2
Affiliation  

With the sharp booming of online live streaming platforms, some anchors seek profits and accumulate popularity by mixing inappropriate content into live programs. After being blacklisted, these anchors even forged their identities to change the platform to continue live, causing great harm to the network environment. Therefore, we propose an anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit (GRU) for anchor identification of live platform. First, the speech of the anchor is extracted from the live streaming by using voice activation detection (VAD) and speech separation. Then, the feature sequence of anchor voiceprint is generated from the speech waveform with the self-attention network RawNet-SA. Finally, the feature sequence of anchor voiceprint is aggregated by GRU to transform into a deep voiceprint feature vector for anchor recognition. Experiments are conducted on the VoxCeleb, CN-Celeb, and MUSAN dataset, and the competitive results demonstrate that our method can effectively recognize the anchor voiceprint in video streaming.

中文翻译:

通过 RawNet-SA 和门控循环单元在直播流中进行锚定声纹识别

随着网络直播平台的蓬勃发展,一些主播通过在直播节目中混入不合适的内容来谋取利益,积累人气。这些主播被列入黑名单后,甚至伪造身份换平台继续直播,对网络环境造成极大危害。因此,我们提出了通过 RawNet-SA 和门控循环单元 (GRU) 在直播流中进行锚点声纹识别,用于直播平台的锚点识别。首先,通过使用语音激活检测(VAD)和语音分离从直播流中提取主播的语音。然后,使用自注意力网络 RawNet-SA 从语音波形中生成锚声纹的特征序列。最后,锚声纹的特征序列由GRU聚合,转化为深度声纹特征向量进行锚识别。在 VoxCeleb、CN-Celeb 和 MUSAN 数据集上进行了实验,竞争结果表明我们的方法可以有效地识别视频流中的锚声纹。
更新日期:2021-12-20
down
wechat
bug