Nonuniform language in technical writing: Detection and correction,Natural Language Engineering

当前位置： X-MOL 学术 › Nat. Lang. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Nonuniform language in technical writing: Detection and correction
Natural Language Engineering ( IF 2.5 ) Pub Date : 2020-03-06 , DOI: 10.1017/s1351324920000133
Weibo Wang , Aminul Islam , Abidalrahman Moh’d , Axel J. Soto , Evangelos E. Milios

Technical writing in professional environments, such as user manual authoring, requires the use of uniform language. Nonuniform language refers to sentences in a technical document that are intended to have the same meaning within a similar context, but use different words or writing style. Addressing this nonuniformity problem requires the performance of two tasks. The first task, which we named nonuniform language detection (NLD), is detecting such sentences. We propose an NLD method that utilizes different similarity algorithms at lexical, syntactic, semantic and pragmatic levels. Different features are extracted and integrated by applying a machine learning classification method. The second task, which we named nonuniform language correction (NLC), is deciding which sentence among the detected ones is more appropriate for that context. To address this problem, we propose an NLC method that combines contraction removal, near-synonym choice, and text readability comparison. We tested our methods using smartphone user manuals. We finally compared our methods against state-of-the-art methods in paraphrase detection (for NLD) and against expert annotators (for both NLD and NLC). The experiments demonstrate that the proposed methods achieve performance that matches expert annotators.

中文翻译：

技术写作中的非统一语言：检测和纠正

专业环境中的技术写作，例如用户手册创作，需要使用统一的语言。非统一语言是指技术文档中的句子在相似的上下文中具有相同的含义，但使用不同的单词或写作风格。解决这种不均匀性问题需要执行两项任务。我们将其命名为非统一语言检测 (NLD) 的第一项任务是检测此类句子。我们提出了一种 NLD 方法，该方法在词汇、句法、语义和语用层面使用不同的相似性算法。通过应用机器学习分类方法提取和集成不同的特征。第二个任务，我们命名为非统一语言校正 (NLC)，是决定检测到的句子中哪个句子更适合该上下文。为了解决这个问题，我们提出了一种结合了收缩去除、近义词选择和文本可读性比较的 NLC 方法。我们使用智能手机用户手册测试了我们的方法。最后，我们将我们的方法与释义检测（用于 NLD）和专家注释器（用于 NLD 和 NLC）中的最先进方法进行了比较。实验表明，所提出的方法实现了与专家注释器相匹配的性能。

更新日期：2020-03-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>