当前位置: X-MOL 学术Comput. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HTTP-level e-commerce data based on server access logs for an online store
Computer Networks ( IF 5.6 ) Pub Date : 2020-10-07 , DOI: 10.1016/j.comnet.2020.107589
Grzegorz Chodak 1 , Grażyna Suchacka 2 , Yash Chawla 1
Affiliation  

Web server logs have been extensively used as a source of data on the characteristics of Web traffic and users’ navigational patterns. In particular, Web bot detection and online purchase prediction using methods from artificial intelligence (AI) are currently key areas of research. However, in reality, it is hard to obtain logs from actual online stores and there is no common dataset that can be used across different studies. Moreover, there is a lack of studies exploring Web traffic over a longer period of time, due to the unavailability of long-term data from server logs.

The need to develop reliable models of Web traffic, Web user navigation, and e-customer behaviour calls for an up-to-date, large-volume e-commerce dataset on Web traffic. Similarly, AI problems require a sufficient amount of solid, real-life data to train and validate new models and methods. Thus, to meet a demand of a publicly available long-term e-commerce dataset, we collected access log data describing the operation of an online store over a six-month period. Using a program written in the C# language, data were aggregated, transformed, and anonymized. As a result, we release this EClog dataset in CSV format, which covers 183 days of HTTP-level e-commerce traffic. The data will be beneficial for research in many areas, including computer science, data science, management, and sociology.



中文翻译:

基于在线商店服务器访问日志的 HTTP 级电子商务数据

Web 服务器日志已被广泛用作有关 Web 流量特征和用户导航模式的数据源。特别是,使用人工智能 (AI) 方法的网络机器人检测和在线购买预测是当前的关键研究领域。然而,实际上,很难从实际的在线商店获取日志,并且没有可以跨不同研究使用的通用数据集。此外,由于无法从服务器日志中获取长期数据,因此缺乏对较长时期内 Web 流量进行探索的研究。

开发可靠的 Web 流量、Web 用户导航和电子客户行为模型的需求需要一个关于 Web 流量的最新、大容量电子商务数据集。同样,人工智能问题也需要足够多的真实数据来训练和验证新模型和方法。因此,为了满足对公开可用的长期电子商务数据集的需求,我们收集了描述在线商店在六个月内的操作的访问日志数据。使用用 C# 语言编写的程序,对数据进行聚合、转换和匿名化处理。因此,我们以 CSV 格式发布了这个 EClog 数据集,涵盖了 183 天的 HTTP 级电子商务流量。这些数据将有利于许多领域的研究,包括计算机科学、数据科学、管理和社会学。

更新日期:2020-10-30
down
wechat
bug