Applied Acoustics ( IF 3.4 ) Pub Date : 2021-01-09 , DOI: 10.1016/j.apacoust.2020.107864 Anirban Bhowmick , Astik Biswas , Nella AnveshKumar , Rahul Kottath
Language identification (LID) is identifying a language in a given spoken utterance. Language segmentation is equally important as language identification where language boundaries can be spotted in a multi-language utterance. Language identification could be a trivial front-end process for real-time mixed-speech recognition applications. India is a multilingual country and mixing two languages in a single conversation is very usual. In this paper, we have experimented with two schemes for language identification in Indian regional language context as very few works have been done. Singular value-based feature embedding is used for both of the schemes. In the first scheme, the singular value decomposition (SVD) is applied to the n-gram utterance matrix and in the second scheme, SVD is applied to the difference supervector matrix space. We have observed that in both the schemes, 55–65% singular value energy is sufficient to capture the language context. We have also seen how these two schemes are preserving language context. In n-gram based feature representation, we have seen that different skipgram models capture different language context. We have observed that for short test duration, supervector based feature representation is better but with a longer duration test signal, n-gram based feature performed better. We have also extended our work to explore language-based segmentation, where we have seen that segmentation accuracy of four language group with ten language training model, scheme-1 has performed well but with same four language training model, scheme-2 has shown better accuracy. In a multilingual language setup, the language-based identification and segmentation will be useful to identify the language as well as the duration of its presence. Further, the language-specific model can be used to identify the speech.
中文翻译:
基于奇异值分解的特征嵌入识别/分割印度地方语言
语言识别(LID)可以识别给定语音中的语言。语言分段与语言识别同等重要,在语言识别中,可以在多国语言中发现语言边界。对于实时混合语音识别应用程序来说,语言识别可能是一个琐碎的前端过程。印度是一个使用多种语言的国家,通常在一次对话中将两种语言混在一起。在本文中,我们已经尝试了两种在印度区域语言环境中进行语言识别的方案,因为所做的工作很少。这两种方案均使用基于奇异值的特征嵌入。在第一种方案中,将奇异值分解(SVD)应用于n-gram话语矩阵,在第二种方案中,将SVD应用于差分超向量矩阵空间。我们已经观察到,在两种方案中,55%至65%的奇异值能量足以捕获语言上下文。我们还看到了这两种方案如何保留语言上下文。在基于n-gram的特征表示中,我们已经看到不同的skipgram模型捕获了不同的语言上下文。我们已经观察到,对于较短的测试持续时间,基于超向量的特征表示会更好,但是对于持续时间较长的测试信号,基于n-gram的特征会更好。我们还扩展了工作,以探索基于语言的细分,我们发现采用10种语言训练模型,方案1的四种语言组的分割精度表现良好,但采用相同的四种语言训练模型,方案2的分割精度表现更好准确性。在多语言设置中,基于语言的识别和细分将有助于识别语言及其存在的持续时间。此外,特定于语言的模型可以用于识别语音。