当前位置: X-MOL 学术J. Exp. Theor. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An investigation of solutions for handling incomplete online review datasets with missing values
Journal of Experimental & Theoretical Artificial Intelligence ( IF 1.7 ) Pub Date : 2021-07-05 , DOI: 10.1080/0952813x.2021.1948920
Ya-Han Hu, Chih-Fong Tsai

ABSTRACT

Online review helpfulness prediction is an important research issue in electronic commerce and data mining. However, the collected datasets used for the analysis and prediction of the helpfulness of online reviews often contain some missing attribute values, such as reviewer background and rating information. In related literatures, many studies have either used the case deletion approach to remove the data containing missing values or considered the imputation of missing values by the mean/mode method. However, none of them consider the direct handling approach without missing value imputation for online review datasets by decision tree-related techniques. Therefore, in this paper, we investigate the suitability of different types of approaches to solve the incomplete dataset problem of online reviews. Specifically, for missing value imputation, several supervised learning techniques including MICE, KNN, SVM, and CART are examined. Moreover, for the direct handling approach without missing value imputation, CART is also performed for this task. The experimental results based on the TripAdvisor dataset for review helpfulness prediction show that the approach where incomplete online review datasets are handled directly without imputation by CART significantly outperforms the other approaches, including case deletion and missing value imputation approaches.



中文翻译:

处理具有缺失值的不完整在线评论数据集的解决方案研究

摘要

在线评论有用性预测是电子商务和数据挖掘中的一个重要研究课题。然而,用于分析和预测在线评论有用性的收集数据集通常包含一些缺失的属性值,例如评论者背景和评分信息。在相关文献中,许多研究要么使用案例删除方法来删除包含缺失值的数据,要么考虑通过均值/众数法对缺失值进行填补。然而,他们都没有考虑通过决策树相关技术对在线评论数据集进行缺失值插补的直接处理方法。因此,在本文中,我们研究了不同类型方法对解决在线评论的不完整数据集问题的适用性。具体来说,对于缺失值插补,研究了几种监督学习技术,包括 MICE、KNN、SVM 和 CART。此外,对于没有缺失值插补的直接处理方法,还为此任务执行了 CART。基于 TripAdvisor 数据集进行评论有用性预测的实验结果表明,不通过 CART 插补直接处理不完整的在线评论数据集的方法明显优于其他方法,包括案例删除和缺失值插补方法。

更新日期:2021-07-05
down
wechat
bug