当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Output-based transfer learning in genetic programming for document classification
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-11-11 , DOI: 10.1016/j.knosys.2020.106597
Wenlong Fu , Bing Xue , Xiaoying Gao , Mengjie Zhang

Transfer learning has been studied in document classification for transferring a model trained from a source domain (SD) to a relatively similar target domain (TD). In feature-based transfer learning techniques, there is an investigation on the features being transferred from SD to TD. This paper conducts an investigation on an output-based transfer learning system using Genetic Programming (GP) in document classification tasks, which automatically selects features to construct classifiers. The proposed GP system directly generates programs from a set of sparse features and only considers the output change of the evolved programs from SD to TD. A linear model is then used to combine existing GP programs from SD as features to TD. Also, new GP programs are mutated from the programs evolved in SD to improve the accuracy. Via directly utilising the evolved GP programs and their mutations, the feature extraction and estimation processes on TD are avoided. The results for the experiments demonstrates that the GP programs from SD can be effectively used for classifying documents in the relevant TD. The results also show that it is easy to train effective classifiers on TD when the GP programs are used as features. Furthermore, the proposed linear model, using multiple GP programs from SD as its inputs, outperforms single GP programs which are directly obtained from TD.



中文翻译:

基因编程中基于输出的转移学习,用于文档分类

在文档分类中已经研究了转移学习,用于将训练后的模型从源域(SD)转移到相对相似的目标域(TD)。在基于特征的转移学习技术中,对从SD转移到TD的特征进行了研究。本文对在文档分类任务中使用遗传编程(GP)的基于输出的转移学习系统进行了研究,该系统会自动选择特征以构造分类器。拟议的GP系统直接从一组稀疏特征中生成程序,并且仅考虑从SDTD演变的程序的输出变化。然后使用线性模型将现有的GP程序从SD作为特征组合到TD。此外,新的GP程序是从SD演变而来的程序中变异而来,以提高准确性。通过直接利用进化的GP程序及其变异,避免了TD上的特征提取和估计过程。实验结果表明,SD的GP程序可以有效地用于对相关TD中的文档进行分类。结果还表明,使用GP程序作为特征时,很容易在TD上训练有效的分类器。此外,建议的线性模型使用SD的多个GP程序作为输入,其性能优于直接从TD获得的单个GP程序。

更新日期:2020-11-12
down
wechat
bug