当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pinyin-to-Chinese conversion on sentence-level for domain-specific applications using self-attention model
Multimedia Systems ( IF 3.5 ) Pub Date : 2021-07-17 , DOI: 10.1007/s00530-021-00829-y
Shufeng Xiong 1 , Li Ma 2 , Bingkun Wang 2 , Ming Cheng 3
Affiliation  

In the pinyin-based Chinese input method engine (IME), its performance depends mainly on the Pinyin-to-Chinese (P2C) conversion module. Traditional methods for P2C follow a pipeline procedure, which typically suffers from error propagation. Also, the ability to input the whole sentence of pinyin-based Chinese IME for domain-specific application needs to be improved. In this paper, we propose a neural self-attention model for Pinyin Sequence to Chinese Sequence (PS2CS) conversion method, which directly infers the entire Chinese sequence by feeding the unsegmented pinyin character sequence into. Our experimental results show that the proposed method outperforms baselines and the commercial IME on specific medical domain dataset, and also achieves comparable performance on the domain-general dataset.



中文翻译:

使用自注意力模型的特定领域应用的句子级拼音到汉语转换

在基于拼音的中文输入法引擎(IME)中,其性能主要取决于拼音到中文(P2C)转换模块。P2C 的传统方法遵循流水线过程,通常会受到错误传播的影响。此外,针对特定领域应用的基于拼音的中文输入法的全句输入能力需要改进。在本文中,我们提出了一种用于拼音序列到中文序列(PS2CS)转换方法的神经自注意力模型,它通过输入未分词的拼音直接推断整个中文序列字符序列成。我们的实验结果表明,所提出的方法在特定医学领域数据集上优于基线和商业 IME,并且在领域通用数据集上也达到了可比的性能。

更新日期:2021-07-18
down
wechat
bug