A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems,ACM Transactions on Internet Technology

当前位置： X-MOL 学术 › ACM Trans. Internet Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems
ACM Transactions on Internet Technology ( IF 5.3 ) Pub Date : 2020-10-28 , DOI: 10.1145/3400024
Hong Xie ₁ , Yongkun Li ₂ , John C. S. Lui ₃

Affiliation

Feedback-based reputation systems are widely deployed in E-commerce systems. Evidence shows that earning a reputable label (for sellers of such systems) may take a substantial amount of time, and this implies a reduction of profit. We propose to enhance sellers’ reputation via price discounts. However, the challenges are as follows: (1) The demands from buyers depend on both the discount and reputation, and (2) the demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semi-Markov decision process to explore the optimal tradeoffs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a Q-learning with forward projection (QLFP) algorithm, which infers the optimal discount from historical transaction data. We prove that the QLFP algorithm convergences to the optimal policy. We conduct trace-driven simulations using a dataset from eBay to evaluate the QLFP algorithm. Evaluation results show that QLFP improves the profit by as high as 50% over both Q-learning and Speedy Q-learning. The QLFP algorithm also improves both the reputation and profit by as high as two times over the scheme of not providing any price discount.

中文翻译：

一种优化电子商务系统中折扣和声誉权衡的强化学习方法

基于反馈的信誉系统广泛部署在电子商务系统中。有证据表明，获得信誉良好的标签（对于此类系统的卖家）可能需要大量时间，这意味着利润的减少。我们建议通过价格折扣来提高卖家的声誉。然而，挑战如下：（1）买家的需求取决于折扣和声誉，以及（2）卖家不知道需求。为了应对这些挑战，我们首先通过半马尔可夫决策过程制定利润最大化问题，以探索选择价格折扣时的最佳权衡。我们证明了最优利润和最优折扣的单调性。基于单调性，我们设计了一个 Q-learning with forward projection (QLFP) 算法，从历史交易数据中推断出最优折扣。我们证明了 QLFP 算法收敛到最优策略。我们使用来自 eBay 的数据集进行跟踪驱动模拟，以评估 QLFP 算法。评估结果表明，QLFP 比 Q-learning 和 Speedy Q-learning 提高了高达 50% 的利润。与不提供任何价格折扣的方案相比，QLFP 算法还将声誉和利润提高了两倍。

更新日期：2020-10-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>