当前位置: X-MOL 学术Decis. Support Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Session stitching using sequence fingerprinting for web page visits
Decision Support Systems ( IF 6.7 ) Pub Date : 2021-04-28 , DOI: 10.1016/j.dss.2021.113579
Johannes De Smedt , Ewelina Lacka , Spyro Nita , Hans-Helmut Kohls , Ross Paton

The nature of people's web navigation has significantly changed in recent years. The advent of smartphones and other handheld devices has given rise to web users consulting websites with more than one device, or using a shared device. As a result, large volumes of seemingly disjoint data are available, which when analysed together can support decision-making. The task of identifying web sessions by linking such data back to a specific person, however, is hard. The idea of session stitching aims to overcome this by using machine learning inference to identify similar or identical users. Many such efforts use various demographic data or device-based features to train matching algorithms. However, often these variables are not available for every dataset or are recorded differently, making a streamlined setup difficult. Besides, they often result in vast feature spaces which are hard to use for actionable interpretation.

In this paper, we present an alternative approach based on the fingerprinting of web pages visited by users in a single session. By learning behavioural patterns from these sequences of page visits, we obtain features that can be used for matching without requiring sensitive user-agent data such as IP, geo location, or device details as is common with other approaches. Using these sequential fingerprints does not rely on pre-defined features, but only requires the recording of web page visits, making our approach actionable. The approach is empirically tested on real-life web logs and compared with matching using regular user-agent features and state-of-the-art embedding techniques. Results in an ecommerce context show sequential features can still obtain strong performance with fewer features, facilitating decision-making on session stitching and inform subsequent related activities such as marketing or customer analysis.



中文翻译:

使用序列指纹进行网页访问的会话拼接

近年来,人们网络导航的性质发生了显着变化。智能手机和其他手持设备的出现导致网络用户使用多个设备或使用共享设备访问网站。因此,可以使用大量看似不相交的数据,将它们一起分析可以支持决策。然而,通过将此类数据链接回特定人员来识别网络会话的任务很困难。会话拼接的想法旨在通过使用机器学习推理来识别相似或相同的用户来克服这一点。许多此类工作使用各种人口统计数据或基于设备的特征来训练匹配算法。但是,这些变量通常并非适用于每个数据集或以不同方式记录,这使得简化设置变得困难。除了,

在本文中,我们提出了一种基于用户在单个会话中访问的网页指纹识别的替代方法。通过从这些页面访问序列中学习行为模式,我们获得了可用于匹配的功能,而无需像其他方法一样需要敏感的用户代理数据,例如 IP、地理位置或设备详细信息。使用这些顺序指纹不依赖于预定义的特征,而只需要记录网页访问,使我们的方法具有可操作性。该方法在现实生活中的网络日志上进行了经验测试,并与使用常规用户代理功能和最先进嵌入技术的匹配进行了比较。电子商务环境中的结果表明,序列特征仍然可以以较少的特征获得强大的性能,

更新日期:2021-04-28
down
wechat
bug