Application of the Information Bottleneck method to discover user profiles in a Web store,Journal of Organizational Computing and Electronic Commerce

当前位置： X-MOL 学术 › J. Organ. Comput. Electron. Commer. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Application of the Information Bottleneck method to discover user profiles in a Web store
Journal of Organizational Computing and Electronic Commerce ( IF 2.0 ) Pub Date : 2018-03-26 , DOI: 10.1080/10919392.2018.1444340
Jacek Iwański ₁ , Grażyna Suchacka ₁ , Grzegorz Chodak ₂

Affiliation

ABSTRACT The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session samples showed that the method is capable of separating both types of sessions to a large extent. A detailed analysis was performed for the number of clusters ranging from two to seven, and the results were compared to those achieved by applying the most common clustering algorithm, k-means. Increasing the number of clusters generally leads to better results for both algorithms. However, IB demonstrated much higher average efficiency than k-means for the corresponding number of clusters, and this superiority was especially clear for lower number of clusters. The IB-based division of user sessions into seven clusters gives the mean entropy value of 0.28, which means the 95% separation of sessions of both types. Furthermore, a big advantage of our approach is that it gives a possibility to analyze the probability distribution of session attributes in individual clusters, which allows one to discover hidden knowledge about common characteristics of various user profiles and use this knowledge to support managerial decisions.

中文翻译：

应用信息瓶颈方法发现网上商店中的用户档案

摘要本文涉及在电子商务网站上发现具有相似行为模式的 Web 用户组的问题。我们引入了一种基于与用户点击流行为相关的会话属性对用户会话进行无监督分类的新方法，以深入了解各种用户配置文件的特征。该方法使用凝聚信息瓶颈 (IB) 算法。基于真实在线商店的日志数据，该方法在区分购买和非购买会话的能力方面的效率得到了验证，表明了我们方法的一些可能的实际应用。对多个会话样本进行的实验表明，该方法能够在很大程度上分离两种类型的会话。对 2 到 7 个簇的数量进行了详细分析，并将结果与应用最常见的聚类算法 k 均值所获得的结果进行了比较。增加集群的数量通常会为两种算法带来更好的结果。然而，对于相应数量的集群，IB 表现出比 k-means 高得多的平均效率，这种优势对于较少数量的集群尤为明显。基于 IB 的用户会话划分为七个集群给出了 0.28 的平均熵值，这意味着这两种类型的会话有 95% 的分离。此外，我们的方法的一大优势是它可以分析单个集群中会话属性的概率分布，

更新日期：2018-03-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11