当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bidirectional Language Modeling: A Systematic Literature Review
Scientific Programming Pub Date : 2021-05-03 , DOI: 10.1155/2021/6641832
Muhammad Shah Jahan 1 , Habib Ullah Khan 2 , Shahzad Akbar 3 , Muhammad Umar Farooq 1 , Sarah Gul 4 , Anam Amjad 1
Affiliation  

In transfer learning, two major activities, i.e., pretraining and fine-tuning, are carried out to perform downstream tasks. The advent of transformer architecture and bidirectional language models, e.g., bidirectional encoder representation from transformer (BERT), enables the functionality of transfer learning. Besides, BERT bridges the limitations of unidirectional language models by removing the dependency on the recurrent neural network (RNN). BERT also supports the attention mechanism to read input from any side and understand sentence context better. It is analyzed that the performance of downstream tasks in transfer learning depends upon the various factors such as dataset size, step size, and the number of selected parameters. In state-of-the-art, various research studies produced efficient results by contributing to the pretraining phase. However, a comprehensive investigation and analysis of these research studies is not available yet. Therefore, in this article, a systematic literature review (SLR) is presented investigating thirty-one (31) influential research studies published during 2018–2020. Following contributions are made in this paper: (1) thirty-one (31) models inspired by BERT are extracted. (2) Every model in this paper is compared with RoBERTa (replicated BERT model) having large dataset and batch size but with a small step size. It is concluded that seven (7) out of thirty-one (31) models in this SLR outperforms RoBERTa in which three were trained on a larger dataset while the other four models are trained on a smaller dataset. Besides, among these seven models, six models shared both feedforward network (FFN) and attention across the layers. Rest of the twenty-four (24) models are also studied in this SLR with different parameter settings. Furthermore, it has been concluded that a pretrained model with a large dataset, hidden layers, attention heads, and small step size with parameter sharing produces better results. This SLR will help researchers to pick a suitable model based on their requirements.

中文翻译:

双向语言建模:系统文献综述

在转移学习中,进行了两项主要活动,即预培训和微调,以执行下游任务。转换器架构和双向语言模型(例如,来自转换器(BERT)的双向编码器表示)的出现使转移学习成为可能。此外,BERT消除了对递归神经网络(RNN)的依赖性,从而弥合了单向语言模型的局限性。BERT还支持注意力机制,可从任何一侧读取输入并更好地理解句子上下文。据分析,迁移学习中下游任务的性能取决于各种因素,例如数据集大小,步长大小和所选参数的数量。在最先进的技术中,各种研究通过对预训练阶段的贡献而产生了有效的结果。但是,尚无法对这些研究进行全面的调查和分析。因此,在本文中,提出了系统的文献综述(SLR),以调查2018-2020年间发表的三十一(31)项有影响力的研究。本文做出了以下贡献:(1)提取了BERT启发的三十一(31)个模型。(2)将本文中的每个模型与具有较大数据集和批处理大小但步长较小的RoBERTa(复制的BERT模型)进行比较。结论是,此SLR中的三十一(31)个模型中有七(7)个优于RoBERTa,其中三个模型在较大的数据集上进行训练,而其他四个模型在较小的数据集上进行训练。此外,在这七个模型中,六个模型共享前馈网络(FFN)和跨层关注。在此SLR中还使用不同的参数设置研究了其余二十四(24)个模型。此外,已经得出结论,具有大数据集,隐藏层,注意点以及具有参数共享的小步长的预训练模型可以产生更好的结果。该SLR将帮助研究人员根据他们的要求选择合适的模型。
更新日期:2021-05-03
down
wechat
bug