Robust experience replay sampling for multi-agent reinforcement learning
Introduction
Replay memory is an essential concept in deep reinforcement learning since it enables the algorithms to reuse the observed streams of experiences to improve their internal beliefs. Most of the algorithms use stored samples in the replay memory for data efficiency [14], [27], [30]. Since experience replay breaks data correlation [2], [21], it introduces a significant improvement to the data efficiency to induce stability in training and speed up learning.
On the contrary, the essence of training data collection is fairly modest to obtain in a simulation environment compared to real-world control tasks [2]. Because of such limitations, most reinforcement learning algorithms tend not to shine in real-world applications; henceforth, they become impractical. In these situations, efficient utilization of resources and time is crucial. That way, deep learning-based agents can take advantage of the collected and stored experiences in the replay memory to learn efficiently and mitigate several problems [8].
Normally, approximately one million or more samples can be collected and stored in the memory buffer. Because of that, there must be sampling techniques to sample relevant experiences from the replay memory. Usually, most RL algorithms randomly sample a batch of samples at each step to update the agent’s parameters. Unfortunately, not all samples at the given state should be equally weighted [2], [21]; therefore, random sampling would be inadequate approach to choose useful samples.
This paper proposes a samples filtering technique since discovering samples rich in useful information to agents at the particular states is challenging. It is even more challenging when dealing with problems involving multiple agents. Our technique is adopting a cosine similarity to measure the similarity between two vectors as discussed in Section 4. By computing the similarity scores, we can choose which data samples are suitable for agents’ parameters improvement for better performance. Furthermore, this sampling technique reduces the chance of using the same transitions of state-action pairs and rewards too often. This way, we increase the possibility of examining unexplored decisions. The unexplored decisions will prevent the agents to frequently take the same actions several times and always end up at the same visited states without acquiring new experiences.
The following is the summary of the contributions of our works;
- •
Propose new algorithms for acquiring relevant experiences from experience replay memory through filtering.
- •
Strengthen exploration strategy by reducing repetitive decisions at a given state.
- •
Improve performance, which is higher than or comparable to the baseline algorithms.
- •
Achieve early convergence and improved policy searching in several tasks compared to the baselines.
Section snippets
Related work
While single-agent reinforcement learning (SARL) gains popularity in research as well as industrial applications, multi-agent reinforcement learning (MARL) still is facing some challenges, one of which is non-stationarity. Non-stationarity in a multi-agent environment (Markov games) emerged as a result of changes in each agent’s policy over time due to each agent’s action that also affects state transition functions and reward functions of one another [19].
Various techniques have been
Similarities measure
In 1992, Lin [14] introduced the idea of Experience Replay, which significantly contributed to many reinforcement algorithms. The main idea behind experience replay is to stabilize learning process and introduce sample efficiency during training by repeatedly presenting the collected experiences through sampling the transitions stored in the replay buffer. These transitions are made up of tuples, and at each time-step, a tuple contains a state , action taken, next state and reward
Proposed method
In 1992, Lin [14] introduced the idea of Experience Replay, which significantly contributed to many reinforcement algorithms. The main idea behind experience replay is to stabilize learning process and introduce sample efficiency during training by repeatedly presenting the collected experiences through sampling the transitions stored in the replay buffer. These transitions are made up of tuples, and at each time-step, tuple contains a state , action taken, next state and reward as
Experiments
In this section, we describe our experiments to uncover the potential of these algorithms. We implemented them on top of several baselines and tried several settings to evaluate our sampling technique performance. We used open-source implementations of Graph Convolution Reinforcement Learning for Multi-Agent Cooperation (DGN) [11] and Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning (PIC) [16] as our baselines and assign their names to DGN + RS-MARL and PIC + RS-MARL,
Results and discussion
High return with speedy efficient learning accomplishes the purposes of any reinforcement learning algorithms. With this in our mind, we successfully develop a simple yet powerful algorithms to find and filter highly efficient samples to train agents from collected experiences. We discuss these with experiment results.
Conclusion
This paper proposed a method to sample past experiences stored in the replay buffer to take advantage of them to train agents efficiently in multi-agent reinforcement learning (MARL) environment. We use the state currently observed to filter samples that are needed to implicitly introduce some advantages, which lead to quick convergence, as shown in the experiment results. Like the human learning process does not solve the same problem with the same approach repeatedly to learn to generalize in
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (31)
- et al.
Cooperative multi-agent systems using distributed reinforcement learning techniques
Procedia Comput. Sci.
(2018) - T. Bansal, J. W. Pachocki, S. Sidor, I. Sutskever, I. Mordatch, Emergent complexity via multi-agent competition, arXiv...
- M. Brittain, J. Bertram, X. Yang, P. Wei, Prioritized sequence experience replay, arXiv preprint...
- et al.
Technical Report 10,003 Multi-agent Reinforcement Learning : An Overview
Technical Report
(2012) - et al.
Shared experience actor-critic for multi-agent reinforcement learning
Proceedings of the Conference on Neural Information Processing Systems (NeurIPS’20)
(2020) - et al.
GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms
Proceedings of Machine Learning Research
(2018) - et al.
Convolutional neural network with data augmentation for SAR target recognition
IEEE Geosci. Remote Sens. Lett.
(2016) - et al.
Counterfactual multi-agent policy gradients
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI’18)
(2018) - et al.
Stabilising experience replay for deep multi-agent reinforcement learning
Proceedings of the 34th International Conference on Machine Learning
(2017) - et al.
Nash q-learning for general-sum stochastic games
J. Mach. Learn. Res.
(2003)
Graph convolutional reinforcement learning
8th International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia
Adam: a method for stochastic optimization
A unified game-theoretic approach to multiagent reinforcement learning
Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS-17)
Self-improving reactive agents based on reinforcement learning, planning and teaching
Mach. Learn.
Cited by (11)
Distributed edge-event-triggered consensus of multi-agent system under DoS attack
2023, Pattern Recognition LettersMemory-efficient distribution-guided experience sampling for policy consolidation
2022, Pattern Recognition LettersImproving Deep Deterministic Policy Gradient with Compact Experience Replay
2024, Research SquareIPERS: Individual Prioritized Experience Replay with Subgoals for Sparse Reward Multi-Agent Reinforcement Learning
2023, Frontiers in Artificial Intelligence and ApplicationsAdvances and Challenges in Learning from Experience Replay
2023, Research SquareMEET: A Monte Carlo Exploration-Exploitation Trade-Off for Buffer Sampling
2023, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings