Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers,Neural Computing and Applications

当前位置： X-MOL 学术 › Neural Comput. & Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
Neural Computing and Applications ( IF 6 ) Pub Date : 2020-06-05 , DOI: 10.1007/s00521-020-04991-8
Paola Zola , Paulo Cortez , Eugenio Brentari

This paper addresses the nontrivial task of Twitter financial disambiguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to generate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).

中文翻译：

通过一类和两类新闻标题分类器对Twitter合金钢进行歧义消除和用户关联

本文解决了Twitter财务歧义消除（TFD）的一项重要任务，该任务与在不采用唯一标识符（例如现金标签）的情况下过滤金融领域推文（例如合金钢或咖啡价格）有关。为了使TFD自动化，我们提出了一种转移学习方法，该方法使用免费标记的新闻标题来训练各种一类和两类分类方法。其中包括不同的文本处理转换，统计量度的调整和现代机器学习方法，包括支持向量机（SVM），深度自动编码器和多层感知器。作为案例研究，我们分析了合金钢价格的领域，并收集了一个最新的Twitter数据集。总体而言，最好的结果是通过两类支持向量机（TFM）的TFD统计量度和主题模型功能实现的，使用11,081和3000个手动标记的推文进行测试时，可获得80％和71％的辨别水平。术语频率倒文档频率分类器（TF-IDFC）获得了最佳的一类性能（对于相同的测试推文，分别为78％和69％）。这些模型还用于生成财务用户相关性等级（FUR）分数，旨在过滤相关用户。当使用418个用户的手动标记测试样本进行测试时，SVM和TF-IDFC FUR模型获得了80％和75％的预测用户区分度。这些结果证实了拟议的TFD-FUR联合方法是一种用于选择Twitter文本和用户进行财务社交媒体分析（例如，情绪分析，有影响力的用户的检测）的有价值的工具。术语频率倒文档频率分类器（TF-IDFC）获得了最佳的一类性能（对于相同的测试推文，分别为78％和69％）。这些模型还用于生成财务用户相关性等级（FUR）分数，旨在过滤相关用户。当使用418个用户的手动标记测试样本进行测试时，SVM和TF-IDFC FUR模型获得了80％和75％的预测用户区分度。这些结果证实了拟议的TFD-FUR联合方法是一种用于选择Twitter文本和用户进行财务社交媒体分析（例如，情绪分析，有影响力的用户的检测）的有价值的工具。术语频率倒文档频率分类器（TF-IDFC）获得了最佳的一类性能（对于相同的测试推文，分别为78％和69％）。这些模型还用于生成财务用户相关性等级（FUR）分数，旨在过滤相关用户。当使用418个用户的手动标记测试样本进行测试时，SVM和TF-IDFC FUR模型获得了80％和75％的预测用户区分度。这些结果证实了拟议的TFD-FUR联合方法是一种用于选择Twitter文本和用户进行财务社交媒体分析（例如，情绪分析，有影响力的用户的检测）的有价值的工具。旨在过滤相关用户。当使用418个用户的手动标记测试样本进行测试时，SVM和TF-IDFC FUR模型获得了80％和75％的预测用户区分度。这些结果证实了拟议的TFD-FUR联合方法是一种用于选择Twitter文本和用户进行财务社交媒体分析（例如，情绪分析，有影响力的用户的检测）的有价值的工具。旨在过滤相关用户。当使用418个用户的手动标记测试样本进行测试时，SVM和TF-IDFC FUR模型获得了80％和75％的预测用户区分度。这些结果证实了拟议的TFD-FUR联合方法是一种用于选择Twitter文本和用户进行财务社交媒体分析（例如，情绪分析，有影响力的用户的检测）的有价值的工具。

更新日期：2020-06-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>