Optimizing Semantic Deep Forest for tweet topic classification,Information Systems

当前位置： X-MOL 学术 › Inform. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimizing Semantic Deep Forest for tweet topic classification
Information Systems ( IF 3.0 ) Pub Date : 2021-05-08 , DOI: 10.1016/j.is.2021.101801
Kheir Eddine Daouadi , Rim Zghal Rebaï , Ikram Amous

Nowadays, topic detection from Twitter attracts the attention of several researchers around the world. Different topic classification approaches have been proposed as a result of these research efforts. However, four of the major challenges faced in this context are the use of handcrafted features, the use of Deep Learning algorithms with so many parameters, the fact that their performance is still limited and the lack of sufficient labeled datasets. We propose, Semantic Deep Forest (SDF), a topic classification approach that incorporates contextual Word2vec, WordNet and Deep Forest to detect topic from Twitter accurately. Moreover, extensive parameter sensitivity analysis were conducted to fine-tune the parameters of SDF for our Tweet topic classification task to achieve the best performance. We conducted experiments on three benchmark datasets with standard evaluation scenarios. Experimental results show that: (1) the proposed contextual word2vec models can be successfully used for tweet topic classification and outperform existing state-of-the-art embedding model; (2) The proposed SDF improve the accuracy of tweet topic classification and outperform existing state-of-the-art classification approaches; (3) the proposed SDF does not require huge amount of labeled data in order to achieve good performance, which is the lack in the majority of the state-of-the-art approaches.

中文翻译：

优化语义深林以进行鸣叫主题分类

如今，来自Twitter的主题检测吸引了世界各地数位研究人员的关注。这些研究成果已经提出了不同的主题分类方法。但是，在这种情况下面临的四个主要挑战是手工功能的使用，具有如此众多参数的深度学习算法的使用，它们的性能仍然有限以及缺少足够的标记数据集这一事实。我们建议使用语义深林（SDF），这是一种主题分类方法，该方法结合了上下文Word2vec，WordNet和Deep Forest来从Twitter准确检测主题。此外，针对我们的Tweet主题分类任务，进行了广泛的参数敏感性分析，以微调SDF的参数，以实现最佳性能。我们在具有标准评估方案的三个基准数据集上进行了实验。实验结果表明：（1）所提出的上下文word2vec模型可以成功地用于推特主题分类，并且优于现有的最新嵌入模型；（2）拟议的SDF提高了鸣叫主题分类的准确性，并且胜过了现有的最新分类方法；（3）提出的SDF不需要大量的标记数据即可获得良好的性能，这是大多数最新技术所缺乏的。（2）拟议的SDF提高了鸣叫主题分类的准确性，并且胜过了现有的最新分类方法；（3）提出的SDF不需要大量的标记数据即可获得良好的性能，这是大多数最新技术所缺乏的。（2）拟议的SDF提高了鸣叫主题分类的准确性，并且胜过了现有的最新分类方法；（3）提出的SDF不需要大量的标记数据即可获得良好的性能，这是大多数最新技术所缺乏的。

更新日期：2021-05-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11