High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models,IEEE Access

当前位置： X-MOL 学术 › IEEE Access › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models
IEEE Access ( IF 3.4 ) Pub Date : 2021-06-04 , DOI: 10.1109/access.2021.3086586
Jeong-Jae Kim , Byung-Won On , Ingyu Lee

The current deep learning models detecting relevant web pages show low accuracy because of the poor quality of the training data. In this paper, we propose a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest. Our experimental results with movies and cellphones data sets show that the average F 1 -score of the deep learning models (FNN, CNN, Bi-LSTM, and SeqGAN) trained with our proposed algorithm shows up to 0.9992 in F 1 -score.

中文翻译：

基于深度学习的网页分类模型的高质量训练数据生成

由于训练数据的质量较差，当前检测相关网页的深度学习模型的准确性较低。在本文中，我们提出了一种新颖的算法，可以根据包括感兴趣实体在内的文档的频率自动生成高质量的训练数据。我们对电影和手机数据集的实验结果表明，使用我们提出的算法训练的深度学习模型（FNN、CNN、Bi-LSTM 和 SeqGAN）的平均 F 1 得分高达 0.9992。

更新日期：2021-06-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11