当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Capitalization and punctuation restoration: a survey
Artificial Intelligence Review ( IF 10.7 ) Pub Date : 2021-07-23 , DOI: 10.1007/s10462-021-10051-x
Vasile Păiş 1 , Dan Tufiş 1
Affiliation  

Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms. This is especially significant for textual sources where punctuation and casing are missing, such as the raw output of automatic speech recognition systems. Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing. This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing. Furthermore, current challenges and research directions are highlighted.



中文翻译:

大写和标点符号恢复:调查

确保正确的标点符号和字母大小写是应用复杂自然语言处理算法的关键预处理步骤。这对于缺少标点符号和大小写的文本源尤其重要,例如自动语音识别系统的原始输出。此外,短信和微博平台提供不可靠且经常错误的标点和大小写。本调查概述了用于恢复标点符号和纠正单词大小写的历史和最新技术。此外,还强调了当前的挑战和研究方向。

更新日期:2021-07-23
down
wechat
bug