Imputing sentiment intensity for SaaS service quality aspects using T-nearest neighbors with correlation-weighted Euclidean distance

Raza, Muhammad; Hussain, Farookh Khadeer; Hussain, Omar K.; Rehman, Zia ur; Zhao, Ming

doi:10.1007/s10115-021-01591-3

Imputing sentiment intensity for SaaS service quality aspects using T-nearest neighbors with correlation-weighted Euclidean distance

Regular Paper
Published: 12 July 2021

Volume 63, pages 2541–2584, (2021)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Muhammad Raza¹,
Farookh Khadeer Hussain¹,
Omar K. Hussain ORCID: orcid.org/0000-0002-5738-6560²,
Zia ur Rehman³ &
…
Ming Zhao¹

284 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The rapid, increasing adoption of businesses to deliver their services in Software as a Service (SaaS) products in the marketplace presents selection challenges to users. Recently, major cloud service providers such as Amazon Web Services and Microsoft Azure have introduced well-architected frameworks that assess SaaS products in different pillars (also referred to herein as features). Furthermore, customers leave feedback on these features after using SaaS products. However, they do not provide feedback on all the features of a product, which renders the reviews unusable to prospective users needing to assess a product’s quality before committing to it. Our study addresses this drawback by imputing or inferring the intensity of the customer’s feedback on features that they do not mention in their reviews. Specifically, we propose threshold-based nearest neighbors (T-NN) as an extension of the conventional k-nearest neighbor approach to determine the missing sentiment intensity score of a feature from the values of its other features. We evaluate the proposed approach in two different systems and compare our results with seven other data imputation techniques. The results show that the proposed T-NN approach performs better than the other imputation approaches on the SaaS sentiment dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Analysis of Customers’ Reviews Using Soft Computing Classification Algorithms: A Case Study of Amazon

Harvesting Multiple Resources for Software as a Service Offers: A Big Data Study

Consumers’ Attitude Toward Cloud Services: Sentiment Mining of Online Consumer Reviews

References

Alliance CS (2020). consensus assessment initiative questionnaire (CAIQ). Available: https://cloudsecurityalliance.org/artifacts/consensus-assessments-initiative-questionnaire-v3-1/
AWS (2019) AWS well-architected framework. Available: https://aws.amazon.com/architecture/well-architected/
Azure M (2019) Microsoft Azure well-architected framework. Available: https://docs.microsoft.com/en-us/azure/architecture/framework/
Raza M, Hussain FK, Hussain OK, Zhao M, Z. u. Rehman, (2019) A comparative analysis of machine learning models for quality pillar assessment of SaaS services by multi-class text classification of users’ reviews. Future Gener Comput Syst 101:341–371
Article Google Scholar
Chou S-W, Chiang C-H (2013) Understanding the formation of software-as-a-service (SaaS) satisfaction from the perspective of service quality. Decis Support Syst 56:148–155
Article Google Scholar
Ardagna D, Casale G, Ciavotta M, Pérez JF, Wang W (2014) Quality-of-service in cloud computing: modeling techniques and their applications. J Internet Serv Appl 5(1):11
Article Google Scholar
Wen PX, Dong L (2013) Quality model for evaluating saas service. In: 2013 fourth international conference on emerging intelligent data and web technologies. pp. 83–87
Benlian A Koufaris M, Hess T (2010) The role of SAAS service quality for continued SAAS use: empirical insights from SAAS using firms. p. 26
Benlian A, Koufaris M, Hess T (2011) Service Quality in software-as-a-service: developing the SaaS-qual measure and examining its role in usage continuance. J Manag Inf Syst 28(3):85–126
Article Google Scholar
Repschläger J, Wind S, Zarnekow R, Turowski K (2012) Selection criteria for software as a service: an explorative analysis of provider requirements. In: 18th Americas conference on information systems. AMCIS 2012. 1: 484–495
Badidi E (2013) A framework for software-as-a-service selection and provisioning. Int J Comput Netw Commun 5:189
Google Scholar
Godse M, Mulik S (2009) An approach for selecting software-as-a-service (SaaS) product. IEEE Int Conf Cloud Comput 2009:155–158
Google Scholar
Upadhyay N (2017) Managing cloud service evaluation and selection. Proc Comput Sci 122:1061–1068
Article Google Scholar
Rehman Zu, Hussain OK, Hussain FK (2014) Parallel cloud service selection and ranking based on QoS history. Int J Parallel Program 42(5):820–852
Article Google Scholar
Ezenwoke A, Daramola O, Adigun M (2018) QoS-based ranking and selection of SaaS applications using heterogeneous similarity metrics. J Cloud Comput 7(1):15
Article Google Scholar
Wang Y, He Q, Yang Y (2015) QoS-aware service recommendation for multi-tenant SaaS on the cloud. In: 2015 IEEE international conference on services computing. pp. 178–185
He Q, Han J, Yang Y, Grundy J, Jin H (2012) QoS-driven service selection for multi-tenant SaaS. In: 2012 IEEE fifth international conference on cloud computing. pp. 566–573
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Article MathSciNet MATH Google Scholar
Little RJA, Rubin DB (1986) Statistical analysis with missing data. Wiley
MATH Google Scholar
Gelman A, Hill J (2006) Missing-data imputation. In: Gelman A, Hill J (eds) Data analysis using regression and multilevel/hierarchical models, analytical methods for social research. Cambridge University Press, Cambridge, pp 529–544
Google Scholar
Brownlee J (2016) Master machine learning algorithms: discover how they work and implement them from scratch. Jason Brownlee
Google Scholar
Fan G-F, Guo Y-H, Zheng J-M, Hong W-C (2019) Application of the weighted K-nearest neighbor algorithm for short-term load forecasting. Energies 12(5):916
Article Google Scholar
Zhang S, Cheng D, Deng Z, Zong M, Deng X (2018) A novel kNN algorithm with data-driven k parameter computation. Pattern Recognit Lett 109:44–54
Article Google Scholar
Batista GEAPA, Monard MC (2003) Experimental comparison pf K-NEAREST NEIGHBOUR and MEAN OR MODE imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Article Google Scholar
Batista G, Monard MC (2002) A study of k-nearest neighbour as an imputation method. In: Abraham A, del Solar JR, Ko ̈ppen M (eds) His of frontiers in artificial intelligence and applications. IOS Press
Google Scholar
Kim K-Y, Kim B, Yi G-S (2004) Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinfo 5:160
Article Google Scholar
Brás LP, Menezes JC (2007) Improving cluster-based missing value estimation of DNA microarray data. Biomol Eng 24(2):273–282
Article Google Scholar
Xin T, Ozturk P, Mingyang G (2004) Dynamic feature weighting in nearest neighbor classifiers. In: Proceedings of 2004 international conference on machine learning and cybernetics (IEEE Cat. No.04EX826). 4: 2406–2411
Sun G, Shao J, Han H, Ding X (2016) Missing value imputation for wireless sensory soil data: a comparative study. Springer, Cham, pp 172–184
Google Scholar
Lora AT, Santos JMR, Exposito AG, Ramos JLM, Santos JCR (2007) Electricity market price forecasting based on weighted nearest neighbors techniques. IEEE Trans Power Syst 22(3):1294–1301
Article Google Scholar
Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC-6(4):325–327
Article Google Scholar
Hechenbichler K, Schliep K (2004) Weighted k-nearest-neighbor techniques and ordinal classification. Discussion paper. 399
Lei Y, Zuo M (2009) Gear crack level identification based on weighted K nearest neighbor classification algorithm. Mech Syst Sig Process 23:1535–1547
Article Google Scholar
Tan S (2005) Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Exp Syst Appl 28:667–671
Article Google Scholar
Martin JA, Asiaín J, Maravall D (2011) Robust high performance reinforcement learning through weighted k-nearest neighbors. Neurocomputing 74:1251–1259
Article Google Scholar
Bhattacharya G, Ghosh K, Chowdhury AS (2017) Granger causality driven AHP for feature weighted kNN. Pattern Recogn 66:425–436
Article Google Scholar
Chen Y, Hao Y (2017) A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction. Exp Syst Appl 80:340–355
Article Google Scholar
Nababan A, Sitompul O, Tulus T (2018) Attribute weighting based k-nearest neighbor using gain ratio. J Phys Conf Ser 1007:012007
Article Google Scholar
Biswas N, Chakraborty S, Mullick SS, Das S (2018) A parameter independent fuzzy weighted k-Nearest neighbor classifier. Pattern Recognit Lett 101:80–87
Article Google Scholar
Mateos-García D, García-Gutiérrez J, Riquelme J (2017) On the evolutionary weighting of neighbours and features in the k-nearest neighbour rule. Neurocomputing 326:54–60
Google Scholar
Mateos-García D, García-Gutiérrez J, Riquelme J (2012) On the evolutionary optimization of k-NN by label-dependent feature weighting. Pattern Recognit Lett 33:2232
Article Google Scholar
AlSukker A, Khushaba R, Al-Ani A (2010) Optimizing the k-NN metric weights using differential evolution. In: 2010 international conference on multimedia computing and information technology (MCIT) pp. 89–92
Gou J, Xiong T, Kuang Y (2011) A novel weighted voting for K-nearest neighbor rule. JCP 6:833–840
Google Scholar
Gou J, Du L, Zhang Y, Xiong T (2011) A new distance-weighted k -nearest neighbor classifier. J Inf Comput Sci 9
Jiang L, Zhang H, Cai Z (2006) Dynamic K-nearest-neighbor naive bayes with attribute weighted. pp. 365–368
Wu J, Cai Z, Gao Z (2010) Dynamic K-nearest-neighbor with distance and attribute weighted for classification. pp. V1–356
Yan X (2013) Weighted K-nearest neighbor classification algorithm based on genetic algorithm. TELKOMNIKA Indones J Electr Eng 11:10
Google Scholar
Talavera-Llames R, Pérez-Chacón R, Troncoso A, Martínez-Álvarez F (2018) Big data time series forecasting based on nearest neighbours distributed computing with Spark. Knowledge-Based Syst. https://doi.org/10.1016/j.knosys.2018.07.026
Article Google Scholar
Troncoso A, Riquelme J, Santos J, Martinez-Ramos J, Gomez-Exposito A (2002) Electricity market price forecasting: neural networks versus weighted-distance k Nearest neighbours. pp. 321–330
Ghazanfar MA, Prügel-Bennett A (2013) The advantage of careful imputation sources in sparse data-environment of recommender systems: generating improved SVD-based recommendations. Informatica (Slovenia) 37(1):61–92
MathSciNet Google Scholar
Reid D, Nixon M (2010) Imputing human descriptions in semantic biometrics
Berry M, Dumais ST (2000) Using linear algebra for intelligent information retrieval. SIAM Rev 37:573–595
Article MathSciNet MATH Google Scholar
Berry MW (1992) Large-scale sparse singular value computations. Int J Supercomput Appl 6(1):13–49
Google Scholar
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res JMLR 11:2287–2322
MathSciNet MATH Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Taylor & Francis, Berlin
MATH Google Scholar
Loh W-Y (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowledge Discov 1:14–23
Article Google Scholar
Conversano C, Siciliano R (2009) incremental tree-based missing data imputation with lexicographic ordering. J Classif 26:361–379
Article MathSciNet MATH Google Scholar
Rahman MG, Islam M (2011) A decision tree-based missing value imputation technique for data pre-processing
Rahman MG, Islam M (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowledge-Based Syst 53:51–65
Article Google Scholar
Rockel T, Joenssen DW, Bankhofer U (2017) Decision trees for the imputation of categorical data
Vateekul P, Sarinnapakorn K (2009) Tree-based approach to missing data imputation. pp. 70–75
Borgoni R, Berrington A (2011) Evaluating a sequential tree-based procedure for multivariate imputation of complex missing data structures. Qual Quant 47:1991
Article Google Scholar
D’Ambrosio A, Aria M, Siciliano R (2012) Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J Classif 29:227
Article MathSciNet MATH Google Scholar
Siciliano R, Aria M, D’Ambrosio A (2006) Boosted incremental tree-based imputation of missing data. Springer, Berlin, pp 271–278
Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22(3):400–407
Article MathSciNet MATH Google Scholar
Ma A, Needell D (2018) Stochastic gradient descent for linear systems with missing data. Numer Math Theory Methods Appl 12(1):1–20
MathSciNet MATH Google Scholar
Sportisse A, Boyer C, Dieuleveut A, Josse J (2020) Debiasing stochastic gradient descent to handle missing values
Ma A, Needell D (2018) A gradient descent approach for incomplete linear systems. 764–768
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
MathSciNet MATH Google Scholar
García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282
Article Google Scholar
Moraes R, Valiati JF, Gavião Neto WP (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Exp Syst Appl 40(2):621–633
Article Google Scholar
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113
Article Google Scholar
Mirtalaie MA, Hussain OK, Chang E, Hussain FK (2018) Extracting sentiment knowledge from pros/cons product reviews: discovering features along with the polarity strength of their associated opinions. Exp Syst Appl 114:267–288
Article Google Scholar
Cambria E, Hussain A (2015) SenticNet. In: Cambria E, Hussain A (eds) Sentic Computing: a common-sense-based framework for concept-level sentiment analysis. Springer, Berlin, pp 23–71
Chapter Google Scholar
Cambria E, Li Y, Xing FZ, Poria S, Kwok K (2020) SenticNet 6: ensemble application of symbolic and subsymbolic ai for sentiment analysis. In: proceedings of the 29th ACM international conference on information & knowledge management: Association for Computing Machinery. pp. 105–114
Farra N, Challita E, Assi RA, Hajj H (2010) Sentence-level and document-level sentiment mining for arabic texts. In: IEEE international conference on data mining workshops. pp 1114–1119
Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gener Comput Syst 115:279–294
Article Google Scholar
Akhtar MS, Ekbal A, Cambria E (2020) How intense are you? Predicting intensities of emotions and sentiments using stacked ensemble [Application Notes]. IEEE Comput Intell Mag 15(1):64–75
Article Google Scholar
Hutto CJ, Gilbert E (2015) VADER: a parsimonious rule-based model for sentiment analysis of social media text
Kim YB et al (2016) Predicting fluctuations in cryptocurrency transactions based on user comments and replies. PLoS ONE 11(8):e0161197–e0161197
Article Google Scholar
Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. CoRR
Cheng J, Bernstein M, Danescu-Niculescu-Mizil C, Leskovec J (2017) Anyone can become a troll: causes of trolling behavior in online discussions. In: CSCW : proceedings of the conference on computer-supported cooperative work. Conference on Computer-Supported Cooperative Work. 2017
Butticè V, Colombo M, Wright M (2017) Serial crowdfunding, social capital, and project success. Entrepreneurship Theory Pract. https://doi.org/10.1111/etap.12271
Article Google Scholar
Rodgers J, Nicewander A (1988) "Thirteen ways to look at the correlation coefficient. Am Stat AMER STATIST 42:59–66
Article Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of $m$ rankings. Ann Math Stat 11(1):86–92
Article MathSciNet MATH Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley
MATH Google Scholar
Vapnik V (1999) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Wu C-H, Ho J-M, Lee D (2005) Travel-time prediction with support vector regression. IEEE Trans Intell Transp Syst 5:276–281
Article Google Scholar
Yu PS, Chen ST, Chang IF (2006) Support vector regression for real-time flood stage forecasting. J Hydrol 328(3):704–716
Article Google Scholar
Brownlee J (2016) Machine learning algorithms from scratch with python. Machine Learning Mastery

Download references

Acknowledgements

The first author acknowledges the financial support received from The University of Technology, Sydney. This research was supported partially by the Australian Government through the Australian Research Council's Linkage Projects funding scheme (Project LP160100080).

Author information

Authors and Affiliations

School of Computer Science, University of Technology Sydney, Sydney, NSW, Australia
Muhammad Raza, Farookh Khadeer Hussain & Ming Zhao
School of Business, University of New South Wales, Canberra, Australia
Omar K. Hussain
Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, Pakistan
Zia ur Rehman

Authors

Muhammad Raza
View author publications
You can also search for this author in PubMed Google Scholar
Farookh Khadeer Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Omar K. Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Zia ur Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Omar K. Hussain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (RAR 990 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raza, M., Hussain, F.K., Hussain, O.K. et al. Imputing sentiment intensity for SaaS service quality aspects using T-nearest neighbors with correlation-weighted Euclidean distance. Knowl Inf Syst 63, 2541–2584 (2021). https://doi.org/10.1007/s10115-021-01591-3

Download citation

Received: 19 October 2020
Revised: 30 June 2021
Accepted: 03 July 2021
Published: 12 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10115-021-01591-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Imputing sentiment intensity for SaaS service quality aspects using T-nearest neighbors with correlation-weighted Euclidean distance

Abstract

Access this article

Similar content being viewed by others

Analysis of Customers’ Reviews Using Soft Computing Classification Algorithms: A Case Study of Amazon

Harvesting Multiple Resources for Software as a Service Offers: A Big Data Study

Consumers’ Attitude Toward Cloud Services: Sentiment Mining of Online Consumer Reviews

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (RAR 990 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Imputing sentiment intensity for SaaS service quality aspects using T-nearest neighbors with correlation-weighted Euclidean distance

Abstract

Access this article

Similar content being viewed by others

Analysis of Customers’ Reviews Using Soft Computing Classification Algorithms: A Case Study of Amazon

Harvesting Multiple Resources for Software as a Service Offers: A Big Data Study

Consumers’ Attitude Toward Cloud Services: Sentiment Mining of Online Consumer Reviews

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (RAR 990 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation