当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature-Based Grammar Error Detection System for the English Language
Electronics ( IF 2.9 ) Pub Date : 2020-10-14 , DOI: 10.3390/electronics9101686
Nancy Agarwal , Mudasir Ahmad Wani , Patrick Bours

This work focuses on designing a grammar detection system that understands both structural and contextual information of sentences for validating whether the English sentences are grammatically correct. Most existing systems model a grammar detector by translating the sentences into sequences of either words appearing in the sentences or syntactic tags holding the grammar knowledge of the sentences. In this paper, we show that both these sequencing approaches have limitations. The former model is over specific, whereas the latter model is over generalized, which in turn affects the performance of the grammar classifier. Therefore, the paper proposes a new sequencing approach that contains both information, linguistic as well as syntactic, of a sentence. We call this sequence a Lex-Pos sequence. The main objective of the paper is to demonstrate that the proposed Lex-Pos sequence has the potential to imbibe the specific nature of the linguistic words (i.e., lexicals) and generic structural characteristics of a sentence via Part-Of-Speech (POS) tags, and so, can lead to a significant improvement in detecting grammar errors. Furthermore, the paper proposes a new vector representation technique, Word Embedding One-Hot Encoding (WEOE) to transform this Lex-Pos into mathematical values. The paper also introduces a new error induction technique to artificially generate the POS tag specific incorrect sentences for training. The classifier is trained using two corpora of incorrect sentences, one with general errors and another with POS tag specific errors. Long Short-Term Memory (LSTM) neural network architecture has been employed to build the grammar classifier. The study conducts nine experiments to validate the strength of the Lex-Pos sequences. The Lex-Pos -based models are observed as superior in two ways: (1) they give more accurate predictions; and (2) they are more stable as lesser accuracy drops have been recorded from training to testing. To further prove the potential of the proposed Lex-Pos -based model, we compare it with some well known existing studies.

中文翻译:

基于特征的英语语法错误检测系统

这项工作的重点是设计一种语法检测系统,该系统可以理解句子的结构信息和上下文信息,以验证英语句子在语法上是否正确。大多数现有系统通过将句子翻译成出现在句子中的单词序列或持有句子语法知识的句法标签来对语法检测器建模。在本文中,我们证明了这两种测序方法都有局限性。前者的模型过于具体,而后者的模型过于笼统,进而影响了语法分类器的性能。因此,本文提出了一种新的排序方法,该方法同时包含句子的语言信息和句法信息。我们将此序列称为Lex-Pos序列。本文的主要目的是证明所提出的Lex-Pos序列有可能通过词性(POS)标签吸收语言单词(即词法)的特定性质和句子的通用结构特征因此,可以大大提高检测语法错误的能力。此外,本文提出了一种新的矢量表示技术,即词嵌入单热编码(w ^ËØË)将Lex-Pos转换为数学值。本文还介绍了一种新的错误诱导技术,可以人工生成POS标签特定的错误句子进行训练。使用两个错误句子的语料库对分类器进行训练,一个错误句子包含一般错误,另一个错误包含POS标签特定错误。长短期记忆(LSTM)神经网络体系结构已被用来构建语法分类器。该研究进行了9个实验,以验证Lex-Pos序列的强度。基于Lex-Pos的模型在两个方面被认为具有优越性:(1)它们提供了更准确的预测;(2)由于从训练到测试的准确性下降较小,因此它们更加稳定。为了进一步证明所提出的基于Lex-Pos的模型的潜力,我们将其与一些众所周知的现有研究进行了比较。
更新日期:2020-10-14
down
wechat
bug