Multitask deep learning for native language identification,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multitask deep learning for native language identification
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-09-16 , DOI: 10.1016/j.knosys.2020.106440
Vuk Habic , Alexander Semenov , Eduardo L. Pasiliao

Identifying the native language of a person by their text written in English (L1 identification) plays an important role in such tasks as authorship profiling and identification. With the current proliferation of misinformation in social media, these methods are especially topical. Most studies in this field have focused on the development of supervised classification algorithms, that are trained on a single L1 dataset. Although multiple labeled datasets are available for L1 identification, they contain texts authored by speakers of different languages and do not completely overlap. Current approaches achieve high accuracy on available datasets, but this is attained by training an individual classifier for each dataset. Studies show that joint training of multiple classifiers on different datasets can result in sharing information between the classifiers, leading to an increase in the accuracy of both tasks. In this study, we develop a novel deep neural network (DNN) architecture for L1 classification; it is based on an adversarial multitask learning method that integrates shared knowledge from multiple L1 datasets. We propose several variants of the architecture and rigorously evaluate their performance on multiple datasets. Our results indicate the proposed multitask architecture is more efficient in terms of classification accuracy than previously proposed methods.

中文翻译：

多任务深度学习用于本地语言识别

通过用英语写的文字来识别一个人的母语（L1识别）在诸如作者身份分析和识别之类的任务中起着重要作用。随着当前社交媒体中错误信息的泛滥，这些方法尤其成为话题。该领域的大多数研究都集中在有监督分类算法的开发上，该分类算法在单个L1数据集上进行训练。尽管有多个标记的数据集可用于L1识别，但它们包含由不同语言的作者撰写的文本，并且不会完全重叠。当前的方法在可用数据集上实现了高精度，但这是通过为每个数据集训练一个单独的分类器来实现的。研究表明，在不同的数据集上对多个分类器进行联合训练可以导致分类器之间共享信息，从而提高两个任务的准确性。在这项研究中，我们开发了一种用于L1分类的新型深度神经网络（DNN）架构；它基于对抗性多任务学习方法，该方法整合了来自多个L1数据集的共享知识。我们提出了该体系结构的几种变体，并严格评估了它们在多个数据集上的性能。我们的结果表明，在分类准确性方面，提出的多任务体系结构比以前提出的方法更有效。它基于对抗性多任务学习方法，该方法整合了来自多个L1数据集的共享知识。我们提出了该体系结构的几种变体，并严格评估了它们在多个数据集上的性能。我们的结果表明，在分类准确性方面，提出的多任务体系结构比以前提出的方法更有效。它基于对抗性多任务学习方法，该方法整合了来自多个L1数据集的共享知识。我们提出了该体系结构的几种变体，并严格评估了它们在多个数据集上的性能。我们的结果表明，在分类准确性方面，提出的多任务体系结构比以前提出的方法更有效。

更新日期：2020-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>