Skip to main content
Log in

Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression

  • Published:
Electronic Commerce Research Aims and scope Submit manuscript

Abstract

Click farming has become a common phenomenon, which brings great harm to the online shopping platform and consumers. To identify click farming on the Taobao platform, the largest online shopping platform in China, we use the positive-unlabeled learning method to find reliable negative instances from the unlabeled set and output the identification of click farming with probability rank for all shops, after creating several features from both goods and online shops. Then, a weighted logit model is used to investigate the role of extracted features in dissecting click farming. The empirical findings show that the extracted features are efficient to identify and explain click farming. And, the results show that click farming may not necessarily depend on the state of the shop. Our study can help online consumers to reduce the risk of being deceived, and help the platform to improve its regulatory capacity in click farming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bahnsen, A. C., Aouada, D., Stojanovic, A., & Ottersten, B. (2016). Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51, 134–142.

    Google Scholar 

  2. Barabesi, L., Cerasa, A., Perrotta, D., & Cerioli, A. (2016). Modeling international trade data with the Tweedie distribution for anti-fraud and policy support. European Journal of Operational Research, 248(3), 1031–1043.

    Google Scholar 

  3. Berrar, D. (2016). Learning from automatically labeled data: Case study on click fraud prediction. Knowledge and Information Systems, 46, 477–490.

    Google Scholar 

  4. de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., & Redondo-Expósito, L. (2018). Positive unlabeled learning for building recommender systems in a parliamentary setting. Information Sciences, 433, 221–232.

    Google Scholar 

  5. Carneiro, N., Figueira, G., & Costa, M. (2017). A data mining based system for credit-card fraud detection in e-tail. Decision Support Systems, 95, 91–101.

    Google Scholar 

  6. Carta, S., Fenu, G., Reforgiato, D., & Recupero, S. R. (2019). Fraud detection for e-commerce transactions by employing a prudential multiple consensus model. Journal of Information Security and Applications, 46, 13–22.

    Google Scholar 

  7. Chen, M., Jacob, V. S., Radhakrishnan, S., & Ryu, Y. U. (2015). Can payment-per-click induce improvements in click fraud identification technologies? Information Systems Research, 26(4), 754–772.

    Google Scholar 

  8. Chen, R., Zheng, Y., Weiand, X. M., & Liu, W. J. (2018). Secondhand seller reputation in online markets: A text analytics framework. Decision Support Systems, 108, 96–106.

    Google Scholar 

  9. Denis, F. (1998). PAC learning from positive statistical queries. In Proceedings of the 9th international conference on algorithmic learning theory (pp. 112–126). Berlin: Springer.

  10. Dong, W., Liao, S., & Zhang, Z. (2018). Leveraging financial social media data for corporate fraud detection. Journal of Management Information Systems, 35(2), 461–487.

    Google Scholar 

  11. eMarketer. (2019). E-commerce share of total global retail sales from 2015 to 2023. Retrieved from https://www.statista.com/statistics/534123/e-commerce-share-ofretail-sales-worldwide/. Accessed 8 Apr 2020.

  12. Haider, C. M. R., Iqbal, A., Rahman, A. H., & Rahman, M. S. (2018). An ensemble learning based approach for impression fraud detection in mobile advertising. Journal of Network and Computer Applications, 112, 126–141.

    Google Scholar 

  13. Hernández-González, J., In, I., & Lozano, J. A. (2017). Learning from proportions of positive and unlabeled examples. International Journal of Intelligent Systems, 32(2), 109–133.

    Google Scholar 

  14. Hou, J., Chi, M., Li, T., Guan, Z. H., Luo, K., & Zhang, D. X. (2019). Spreading dynamics of SVFR online fraud information model on heterogeneous networks. Physica A: Statistical Mechanics and its Applications, 534, 122026.

    Google Scholar 

  15. Jang, B., Jeong, S., & Ck, K. (2019). Distance-based customer detection in fake follower makets. Information Systems, 81, 104–116.

    Google Scholar 

  16. Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., He-Guelton, L., et al. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234–245.

    Google Scholar 

  17. Khanna, V., Kim, E. H., & Lu, Y. (2015). CEO connectedness and corporate fraud. The Journal of Finance, 70(3), 1203–1252.

    Google Scholar 

  18. Kumar, N., Venugopal, D., Qiu, L., & Kumar, S. (2018). Detecting review manipulation on online platforms with hierarchical supervised learning. Journal of Management Information Systems, 35(1), 350–380.

    Google Scholar 

  19. Kumar, N., Venugopal, D., Qiu, L., & Kumar, S. (2019). Detecting anomalous online reviewers: An unsupervised approach using mixture models. Journal of Management Information Systems, 36(4), 1313–1346.

    Google Scholar 

  20. Lan, W., Wang, J., Li, M., Liu, J., Li, Y., Wu, F. X., et al. (2016). Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing, 206, 50–57.

    Google Scholar 

  21. Lappas, T., Sabnis, G., & Valkanas, G. (2016). The impact of fake reviews on online visibility: A vulnerability assessment of the hotel industry. Information Systems Research, 27(4), 940–961.

    Google Scholar 

  22. Li, N., Du, S., Zheng, H., Xue, M., & Zhu, H. (2018). Fake reviews tell no tales? Dissecting click farming in content-generated social networks. China Communications, 15(4), 98–109.

    Google Scholar 

  23. Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. International Joint Conference on Artificial Intelligence, 3, 587–592.

    Google Scholar 

  24. Liu, B., Dai, Y., Li, X., Lee, W. S., & Philip, S. Y. (2003). Building text classifiers using positive and unlabeled examples. Citeseer, 3, 179–188.

    Google Scholar 

  25. Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2003). Partially supervised classification of text documents. International Conference on Machine Learning, 2, 387–394.

    Google Scholar 

  26. Liu, Q., Huang, S., & Zhang, L. (2016). The influence of information cascades on online purchase behaviors of search and experience products. Electronic Commerce Research, 16(4), 553–580.

    Google Scholar 

  27. Liu, Y., & Pang, B. (2018). A unified framework for detecting author spamicity by modeling review deviation. Expert Systems With Applications, 112, 148–155.

    Google Scholar 

  28. Luca, M., & Zervas, G. (2016). Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, 62(12), 3412–3427.

    Google Scholar 

  29. Noekhah, S., Nb, S., & Zakaria, N. H. (2020). Opinion spam detection: Using multi-iterative graph-based model. Information Processing & Management, 57(1), 102140.

    Google Scholar 

  30. Ren, K., Yang, H., Zhao, Y., Chen, W., Xue, M., Miao, H., et al. (2018). A robust AUC maximization framework with simultaneous outlier detection and feature selection for positive-unlabeled classification. IEEE Transactions on Neural Networks and Learning Systems, PP(99), 1–12.

    Google Scholar 

  31. Reyes-Menendez, A., Saura, J. R., & Filipe, F. (2019). The importance of behavioral data to identify online fake reviews for tourism businesses: A systematic review. PeerJ Computer Science, 5, e219.

    Google Scholar 

  32. Shihab, M. R., & Putri, A. P. (2019). Negative online reviews of popular products: Understanding the effects of review proportion and quality on consumers’ attitude and intention to buy. Electronic Commerce Research, 19(1), 159–187.

    Google Scholar 

  33. Tan, F. T. C., Guo, Z., Cahalane, M., & Cheng, D. (2016). Developing business analytic capabilities for combating e-commerce identity fraud: A study of trustev’s digital verification solution. Information & Management, 53(7), 878–891.

    Google Scholar 

  34. Thakur, S. (2019). A reputation management mechanism that incorporates accountability in online ratings. Electronic Commerce Research, 19(1), 23–57.

    Google Scholar 

  35. Tsang, S., Koh, Y. S., Dobbie, G., & Alam, S. (2014). Detecting online auction shilling frauds using supervised learning. Expert Systems with Applications, 41(6), 3027–3040.

    Google Scholar 

  36. Wessel, M., Thies, F., & Benlian, A. (2016). The emergence and effects of fake social information: Evidence from crowdfunding. Decision Support Systems, 90, 75–85.

    Google Scholar 

  37. Yang, P., Humphrey, S. J., James, D. E., Yang, Y. H., & Jothi, R. (2015). Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data. Bioinformatics, 32(2), 252–259.

    Google Scholar 

  38. Yang, P. Y., Ormerod, J. T., Liu, W., Ma, C. D., Zomaya, A. Y., & Yang, J. Y. H. (2019). Adasampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Transactions on Cybernetics, 49(5), 1932–1943.

    Google Scholar 

  39. Yu, C. H., & Lin, S. J. (2013). Fuzzy rule optimization for online auction frauds detection based on genetic algorithm. Electronic Commerce Research, 13(2), 169–182.

    Google Scholar 

  40. Yu, H., Han, J., & Chang, K. C. C. (2002). PEBL: Positive example based learning for web page classification using SVM. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 239–248). ACM.

  41. Zhang, C., Gupta, A., Kauten, C., Deokar, A. V., & Qin, X. (2019). Detecting fake news for reducing misinformation risks using analytics approaches. European Journal of Operational Research, 279(316), 1036–1052.

    Google Scholar 

  42. Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456–481.

    Google Scholar 

  43. Zhang, F., Hao, X., Chao, J., & Yuan, S. (2020). Label propagation-based approach for detecting review spammer groups on e-commerce websites. Knowledge-Based Systems, 193, 105520.

    Google Scholar 

  44. Zhang, Y., Bian, J., & Zhu, W. (2013). Trust fraud: A crucial challenge for china’s e-commerce market, electronic commerce research and applications. Electronic Commerce Research and Applications, 12(5), 299–308.

    Google Scholar 

  45. Zhao, J., Lau, R. Y. K., Zhang, W., Zhang, K., Chen, X., & Tang, D. (2016). Extracting and reasoning about implicit behavioral evidences for detecting fraudulent online transactions in e-commerce. Decision Support Systems, 86, 109–121.

    Google Scholar 

  46. Zhao, J., Lau, R. Y. K., Zhang, W., Zhang, K., Chen, X., & Tang, D. (2018). What makes a helpful online review? A meta-analysis of review characteristics. Electronic Commerce Research, 19(2), 257–284.

    Google Scholar 

  47. Zhu, D., Lappas, T., & Zhang, J. (2018). Unsupervised tip-mining from customer reviews. Decision Support Systems, 107, 116–124.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Editor-in-Chief, the Associate Editor, and the three anonymous referees for their helpful comments and constructive guidance. The authors gratefully acknowledge financial support from the National Natural Science Foundation of China (71671056, 91846201), the Humanity and Social Science Foundation of the Ministry of Education of China (19YJA790035), and the National Statistical Science Research Projects of China (2019LD05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qifa Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, C., Zhu, J. & Xu, Q. Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression. Electron Commer Res 22, 157–176 (2022). https://doi.org/10.1007/s10660-020-09418-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10660-020-09418-z

Keywords

Navigation