当前位置: X-MOL 学术ACM Trans. Internet Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems
ACM Transactions on Internet Technology ( IF 5.3 ) Pub Date : 2020-10-28 , DOI: 10.1145/3400024
Hong Xie 1 , Yongkun Li 2 , John C. S. Lui 3
Affiliation  

Feedback-based reputation systems are widely deployed in E-commerce systems. Evidence shows that earning a reputable label (for sellers of such systems) may take a substantial amount of time, and this implies a reduction of profit. We propose to enhance sellers’ reputation via price discounts. However, the challenges are as follows: (1) The demands from buyers depend on both the discount and reputation, and (2) the demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semi-Markov decision process to explore the optimal tradeoffs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a Q-learning with forward projection (QLFP) algorithm, which infers the optimal discount from historical transaction data. We prove that the QLFP algorithm convergences to the optimal policy. We conduct trace-driven simulations using a dataset from eBay to evaluate the QLFP algorithm. Evaluation results show that QLFP improves the profit by as high as 50% over both Q-learning and Speedy Q-learning. The QLFP algorithm also improves both the reputation and profit by as high as two times over the scheme of not providing any price discount.

中文翻译:

一种优化电子商务系统中折扣和声誉权衡的强化学习方法

基于反馈的信誉系统广泛部署在电子商务系统中。有证据表明,获得信誉良好的标签(对于此类系统的卖家)可能需要大量时间,这意味着利润的减少。我们建议通过价格折扣来提高卖家的声誉。然而,挑战如下:(1)买家的需求取决于折扣和声誉,以及(2)卖家不知道需求。为了应对这些挑战,我们首先通过半马尔可夫决策过程制定利润最大化问题,以探索选择价格折扣时的最佳权衡。我们证明了最优利润和最优折扣的单调性。基于单调性,我们设计了一个 Q-learning with forward projection (QLFP) 算法,从历史交易数据中推断出最优折扣。我们证明了 QLFP 算法收敛到最优策略。我们使用来自 eBay 的数据集进行跟踪驱动模拟,以评估 QLFP 算法。评估结果表明,QLFP 比 Q-learning 和 Speedy Q-learning 提高了高达 50% 的利润。与不提供任何价格折扣的方案相比,QLFP 算法还将声誉和利润提高了两倍。
更新日期:2020-10-28
down
wechat
bug