Use of Proximal Policy Optimization for the Joint Replenishment Problem
Introduction
The recent digitization in transport provides new opportunities to improve the efficiency of current logistics networks. By sharing shipment data across companies, freight loads can be combined to increase truck fill rates. In the innovative concept of the Physical Internet, an open interconnected logistics network introduced by [27], companies collaborate by sharing freight and resources [32]. This entails a transition from individually optimized supply chains towards a more holistic and collaborative integrated “network of networks”. To facilitate smooth routing of freight across the network, digital control towers visualize shipments and support replenishment decisions based on real-time information and analytics.
As the foundation of the Physical Internet is built around multi-dimensional collaboration, it complements the emerging sharing economy in which goods and services are being shared and exchanged more easily [7]. Horizontal collaboration has been considered as effective practice for sustainable logistics and freight transport and it has gained increased attention in recent years [33]. It is named as one of the solutions to effectively decarbonize freight transport [23], [2].
A prerequisite of collaborative shipping, either through bundling less-than-truckload (LTL) or backhauling full truckload (FTL) shipments, is that the replenishment cycles of the collaborating companies are synchronized, i.e., their replenishment occurs at the same time. This requires the implementation of a joint replenishment policy that takes into account the specifics of freight transport to stimulate joint shipments and avoid individual transports. The joint replenishment literature is rich and provides a range of heuristics. Most joint replenishment policies, however, ignore the limitation of the vehicle capacity and do not deal with the (under)utilization of the vehicles. We explore how machine learning can be used to develop joint replenishment policies that can be implemented in a control tower setting to facilitate collaborative shipping.
We implement a state-of-the-art machine learning algorithm to decide on the replenishments of a group of collaborating companies, i.e. how much and when to order or ship. We focus on periodic review joint replenishment policies that take the capacity of a full truck into account. The goal is to minimize (joint) transportation, holding and backorder costs. We deploy the proximal policy optimization (PPO) algorithm, a state-of-the-art optimization method in the domain of deep reinforcement learning (DRL). The PPO algorithm has been praised to capture some of the stability and convergence properties of trust region policy optimization algorithms, while being much simpler to implement and tune. Our numerical results confirm its stable and converging behaviour, while developing well-performing policies.
The implementation of machine learning algorithms, such as the one presented in this paper, can facilitate smart automation of freight shipments in today's digital era. It complements the algorithms developed in [12] to identify collaboration partners based on their geographical compatibility and the gain sharing methods, such as the ones discussed in [9]. As such this paper responds to various challenges faced by the current logistics industry, such as the transition towards the Physical Internet, more economically friendly eco-systems, and the digital transformation of logistics.
Section snippets
A review of capacitated joint replenishment policies
The joint replenishment problem (JRP) literature focuses on developing a replenishment policy for multiple items that minimizes the sum of inventory holding, backorder and ordering/transportation costs. The ordering costs typically include a major and a minor ordering cost, with the major ordering cost incurred every time an order is placed, independent of the number of items in the order, and the minor ordering cost charged for every item that is included in the order. The JRP can also be
Deep reinforcement learning and its application in inventory management
Reinforcement learning (RL) is the branch of machine learning algorithms that focuses on sequential decision-making problems. An RL agent aims to learn a policy that maximizes future (discounted) rewards by interacting with an environment. For an extensive introduction to RL we refer to [42]. In contrast to supervised machine learning, where for instance a deep neural net is used to classify images and the function is approximated that maps pixels to the label of the image, RL algorithms
Problem statement
We consider n = 2 shippers and a neutral orchestrator with full visibility on each shipper's demand and inventories, who is in charge of the coordination of their replenishments, for instance using a control tower. To ensure truck efficiency, shipments are only made in FTLs with a capacity of V units. The aggregated order quantity of both shippers in each period t is thus a multiple of an FTL of V units. Denote qi,t shipper i's order quantity and the number of truckloads shipped in period t
Application of the proximal policy optimization algorithm
The PPO algorithm that we implement is an actor-critic algorithm that leverages elements from trust region policy optimization methods. A neural network is used to develop a policy π(st;θ), known as the actor network, and another neural net is used to estimate the value function when following the policy of the actor, which is the critic network. The values of the parameters θ and ϕ of these neural nets are set by training the algorithm, which we elaborate below. By clipping the
Results
We show the performance of the PPO algorithm to our joint replenishment problem in a numerical experiment. We first conduct a small-scale experiment with limited demand support and compare the policies developed by PPO with the optimal policies, found using DP. To solve the DP, we employ value iteration [36] with discount factor γ = 0.99. We evaluate the cost performance, as well as the order quantities and the steady state distributions of the inventory levels. We then expand our study to a
Conclusion
In this paper we use proximal policy optimization to solve the joint replenishment problem with full truckload shipments. Proximal policy optimization is a deep reinforcement learning algorithm, that performs well without extensive hyperparameter tuning. We show in a small-scale experiment with limited demand support how proximal policy optimization develops policies that approximate the optimal policy structure and outperforms the periodic review minimum order quantity and dynamic order-up-to
Declarations of interest
None
References (51)
- et al.
Computational complexity of uncapacitated multi-echelon production planning problems
Operations Research Letters
(1989) - et al.
Joint replenishment inventory control: deterministic and stochastic models
European journal of operational research
(1989) - et al.
A review of the joint replenishment problem literature: 1989-2005
European Journal of Operational Research
(2008) A multi-item periodic replenishment policy with full truckloads
International Journal of Production Economics
(2009)Multi-item inventory control with full truckloads: A comparison of aggregate and individual order triggering
European Journal of Operational Research
(2010)- et al.
An analytical study of the Q(s,S) policy applied to the joint replenishment problem
European Journal of Operational Research
(2005) - et al.
Collaborative shipping under different cost-sharing agreements
European Journal of Operational Research
(2017) Multi-item inventory systems with joint ordering and transportation decisions
International Journal of Production Economics
(1994)- et al.
Multi-item inventory models with co-ordinated replenishments: a survey
International Journal of Operations & Production Management
(1988) - ALICE-ETP (2019). A framework and process for the development of a roadmap towards zero emissions logistics 2050. URL:...