当前位置: X-MOL 学术IEEE Open J. Comput. Soc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker Identification for Business-Card-Type Sensors
IEEE Open Journal of the Computer Society ( IF 5.7 ) Pub Date : 2021-04-26 , DOI: 10.1109/ojcs.2021.3075469
Shunpei Yamaguchi , Ritsuko Oshima , Jun Oshima , Ryota Shiina , Takuya Fujihashi , Shunsuke Saruwatari , Takashi Watanabe

Human collaboration has a great impact on the performance of multi-person activities. The analysis of speaker information and speech timing can be used to extract human collaboration data in detail. Some studies have extracted human collaboration data by identifying a speaker with business-card-type sensors. However, it is difficult to realize speaker identification for business-card-type sensors at low cost and high accuracy because of spikes in the measured sound pressure data, ambient noise in the non-speaker sensor, and synchronization errors across each sensor. This study proposes a novel sound pressure sensor and speaker identification algorithm to realize speaker identification for business-card-type sensors. The sensor extracts the user's speech at low cost and high accuracy by employing a peak hold circuit and time synchronization module for spike mitigation and precise time synchronization. The algorithm identifies a speaker with high accuracy by removing ambient noise. The evaluations show that the algorithm accurately identifies a speaker in a multi-person activity considering varying numbers of users, environmental noises, and reverberation conditions as well as long or short utterances. In addition, the peak hold circuit enables accurate extraction of speech and the synchronization error between the sensors is always within $\pm$ 30 $\boldsymbol\mu$ s, that is, negligible error.

中文翻译:


名片型传感器的说话人识别



人类协作对多人活动的绩效有很大影响。对说话者信息和语音时序的分析可用于详细提取人类协作数据。一些研究通过使用名片型传感器识别说话者来提取人类协作数据。然而,由于测量的声压数据中的尖峰、非说话人传感器中的环境噪声以及每个传感器之间的同步误差,很难以低成本和高精度实现名片型传感器的说话人识别。本研究提出了一种新颖的声压传感器和说话人识别算法,以实现名片型传感器的说话人识别。该传感器通过采用峰值保持电路和时间同步模块来抑制尖峰和精确的时间同步,以低成本和高精度提取用户的语音。该算法通过消除环境噪声来高精度识别说话人。评估表明,考虑到不同数量的用户、环境噪声、混响条件以及长或短的话语,该算法可以准确地识别多人活动中的说话者。此外,峰值保持电路能够准确提取语音,传感器之间的同步误差始终在$\pm$ 30 $\boldsymbol\mu$s以内,即误差可以忽略不计。
更新日期:2021-04-26
down
wechat
bug