Predicting numeric ratings for Google apps using text features and ensemble learning,ETRI Journal

当前位置： X-MOL 学术 › ETRI J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Predicting numeric ratings for Google apps using text features and ensemble learning
ETRI Journal ( IF 1.3 ) Pub Date : 2020-07-06 , DOI: 10.4218/etrij.2019-0443
Muhammad Umer ₁ , Imran Ashraf ₂ , Arif Mehmood ₃ , Saleem Ullah ₁ , Gyu Sang Choi ₂

Affiliation

Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning‐based classifiers to predict authentic numeric ratings based on actual user reviews.

中文翻译：

使用文本功能和集成学习预测Google应用的数字评分

应用程序（app）评级是用户自愿提供的反馈，并充当应用程序的重要评估标准。但是，这些评分通常由于选票不足或缺失而有偏差。此外，在数字评级和用户评论之间已经观察到显着差异。这项研究旨在使用机器学习分类器预测Google应用的数字评分。它利用用户提供的数字应用评分作为训练数据，并通过分析用户评论返回真实的移动应用评分。为此，提出了一种综合学习模型，该模型考虑了术语频率/文档反向频率（TF / IDF）功能。使用了三个TF / IDF功能，包括单字组，双字组和三字组。该数据集是从Google Play商店中抓取的，从14种不同的应用类别中提取数据。使用TextBlob分析来区分有偏和无偏的用户评分，以制定基本事实，然后从中评估分类器的预测准确性。结果表明，基于机器学习的分类器基于实际用户评论预测真实数字评分的潜力很大。

更新日期：2020-07-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11