Deep reinforcement learning for portfolio management of markets with a dynamic number of assets
Introduction
Generating financial profits by trading cryptocurrencies is challenging due to their price erratic changes. Cryptocurrencies are decentralized electronic financial assets that appeared as an alternative to fiat currencies(Nakamoto, 2008). However, according toCorbet et al. (2014) the prices of cryptocurrencies are affected by government announcements, policies and actions, in spite of the fact they are decentralized assets. Additionally, cryptocurrency prices show higher volatility than those of traditional assets. For instance, in early 2017, the price of Bitcoin, the well-known cryptocurrency, reached its maximum historical peak of approximately 19,000 USD per unit, but during the subsequent months it plunged to 3000 USD, followed by a strong bounce to its current price of approximately 8,000 USD per unit. For this price behavior, formulating cryptocurrency trading strategies is a non-trivial task.
Reinforcement learning (RL) is a suitable framework to process complex data and handle difficult decision-making processes such as asset trading. A trading process can be naturally formulated as an RL process. In this type of process, an agent takes actions over an environment based on observations of the states of that environment; rewards are received by that agent as a consequence of both the states visited and the actions taken. In the specific case of asset trading, a state of the environment is equivalent to the recent history of the assets; actions are the transactions made to get rid of some of the assets held by the agent and acquire new ones, and the rewards are scalar functions of the earnings or losses seen by the agent for taking those actions. The vector containing the information of the assets held by an agent at any moment is called the portfolio, hence this type of process is also known as portfolio management.
Typically RL algorithms have fixed state and action spaces. However, new assets are often added to cryptocurrency markets(Narayanan et al., 2016). Hence, to rapidly incorporate those assets into the process, adaptable state and action spaces are necessary. Most works on automatic asset trading assume the number of assets is static(Bu and Cho, 2018, Jiang and Liang, 2017, Liang et al., 2018, Pendharkar and Cusatis, 2018). Thereby, convolutional layers of neural networks can extract useful information about the prices of that specific set of assets. But, by doing so, a large portion of data is wasted because only a small number of assets is used for training algorithms while datasets contain information collected from dozens and even hundreds of assets. This is an important issue, not only from a sample-efficiency point of view, but also because critical earnings may be accomplished by trading assets that are suddenly incorporated into a market. For instance, in Fig.1, Dock (coin) reached 2.2 times its original price during the first 20 days in the market, then it fell and settled at about 1.2 times its original price on the subsequent days. This behavior has been observed often in assets recently added to markets; however, we are confident it can be predicted and exploited.
This work proposes an RL method using a recurrent neural network (RNN) to perform portfolio management on markets in which the number of assets may change over time. Proximal Policy Optimization (PPO)(Schulman et al., 2017) was adapted for this purpose. PPO is a popular deep RL algorithm, with an actor–critic architecture, that has been shown to perform well on difficult tasks such as video-game playing and dexterous robotics control(OpenAI et al., 2020, Schulman et al., 2017). PPO has been recently applied to portfolio management in markets with a fixed number of assets(Liang et al., 2018). However, to adapt to a dynamic number of assets, we propose a particular architecture that processes assets individually and uses the current portfolio entries for weighting. This results in a network able to effectively process assets that were never even seen during training, without requiring extra training or memory. The proposed method was backtested using data of a cryptocurrency market along state-of-the art baselines in three different setups, which correspond to episodes with lengths of one day, 30 days and 16 weeks with holding periods of 30min, one day and one week, respectively. The performances of the methods were evaluated using two standard measures for investing and trading: total return and Sharpe ratio. Our method outperformed the baselines in all the tested setups.
Keeping the number of transactions as small as possible is an important issue to consider while doing asset trading. Markets obtain revenues from their services in the form of transaction costs. Any agent that buys or sells assets gives a small percentage of those transactions to the service provider. Cryptocurrency market transaction fees are typically lower than 1%, which are among the lowest compared to fees found in other types of financial asset markets. However, these seemingly negligible fees become important when transactions are made frequently, for instance in periods of minutes or hours. This is because the changes in the assets acquired by the agent may not compensate for the losses suffered by transaction costs. Hence, the algorithm should aim to keep the number of transactions low. To cope with this issue, in our design, the current portfolio vector is given to the network in the output layers, penalizing assets not held by the agent.
Additionally, a novel algorithm to compute the optimal transactions is given in this work. In a market where transaction costs exist, if an agent wants to obtain a portfolio vector satisfying certain specific proportions, the agent needs to perform transactions, thus giving up some amount for doing that.Ormos and Urbán (2013) proposed an iterative method to compute the values of the transactions needed to convert some portfolio into another with minimal cost. However, this method assumes transaction costs are the same for all assets, and this is not always the case. We propose a different approach to this problem in which this assumption is not needed. To do this, the problem of finding the optimal transactions given some desired portfolio proportions is converted into a linear program (LP). The main contributions of this work are:
- •
Formulation of a trading system without the limitation of having a market with a fixed number of assets. Our method is sample-efficient, its implementation is straightforward, and during deployment, it is able to integrate assets into the process that appear suddenly in the market without the need of extra training.
- •
Transaction costs are considered and managed in our work. A novel algorithm to compute the optimal transactions is proposed and integrated into the system.
- •
Implementation of the proposed method using the dataset of a cryptocurrency market. The reliability of our method is tested under three different trading setups to show its adaptability.
The rest of this paper is organized as follows. Section2 presents related works in the field. Section3 describes the mathematical definition of the portfolio management problem. Section4 describes the proposed method. Section5 describes the transaction optimization process. Section6 explains the experiment setups and metrics used to evaluate them. In Section7, the results of the experiments are discussed. Finally, conclusions and directions for future work are presented in Section8.
Section snippets
Related works
Deep learning approaches for portfolio management can be divided into two groups: model-based and model-free methods. Model-based methods, as their name suggests, assume models of the asset behavior exist, and deep neural networks (DNNs) are used to approximate these models using supervised learning on price datasets. Model-based methods do not cope with asset trading directly; instead, they require secondary methods to process the predicted prices, which typically use conventional heuristics
Problem definition
This section introduces the concepts of market, investor and trading session, as well as the mathematical definition of portfolio management.
Proposed method
A trading session can be naturally formulated in the RL framework. In an RL process, an agent visits the states of an environment and takes actions in each visited state. In return, the environment gives rewards to the agent for taking those actions. After the agent executes an action, the environment evolves into a new state due to both the environmental dynamics and the action itself. An episode is the set of interactions between the agent and environment from its initialization until a final
Transaction optimization problem
The outputs of the policy network are the entries of the portfolio vector . In general , consequently some assets have to be sold and others purchased to satisfy the desired portfolio vector. For simplicity, let us drop the time dependency of the expressions, since this is understood by context, for instance and are written as and , respectively. Let us also represent the shares acquired and sold per asset at some period by the non-negative variables
Experiments
This section describes the setups of our experiments. These include dataset features, metrics for evaluating the algorithms and implementation details. We assume the amounts traded by our agents are sufficiently small that the prices of assets are not affected by these transactions, and the available shares in the market are large enough that transactions are executed immediately.
Results
In the first experiment, which corresponds to trading sessions of 1 day and holding periods of 30min, DNA-S obtained the highest results; these are shown in Table4. DNA-S obtained an average TR of 0.224, which is more than double the score of the closest competitor CNN, which obtained 0.088. DNA-R obtained the third best results with an average TR of 0.041. The other two approaches: DQN and TD() obtained the lowest scores, with average TRs close to zero. The average SRs obtained by all the
Conclusions and future work
We introduced a method that performs portfolio management on markets with transaction costs in which the number of assets is dynamic. Even though our method is able to integrate new assets into the process during deployment, it does not require extra training or memory and its implementation is straightforward. Our method was tested on a cryptocurrency market outperforming state-of-the-art methods under three distinct setups. Additionally, a novel algorithm to compute transactions with minimal
CRediT authorship contribution statement
Carlos Betancourt: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Wen-Hui Chen: Resources, Writing - review & editing, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (32)
- et al.
Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization
Expert Systems with Applications
(2020) - et al.
An automated fx trading system using adaptive reinforcement learning
Expert Systems with Applications
(2006) - et al.
Improving financial trading decisions using deep q-learning: Predicting the number of shares, action strategies, and transfer learning
Expert Systems with Applications
(2019) - et al.
Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading
Expert Systems with Applications
(2020) - et al.
An intelligent financial portfolio trading strategy using deep q-learning
Expert Systems with Applications
(2020) - et al.
Trading financial indices with reinforcement learning agents
Expert Systems with Applications
(2018) - et al.
Stablecoins: The quest for a low-volatility cryptocurrency
- et al.
Convex Optimization
(2004) - et al.
Learning optimal q-function using deep boltzmann machine for reliable trading of cryptocurrency
- Cho, K., VanMerriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning...
The influence of central bank monetary policy announcements on cryptocurrency return volatility
Linear programming and extensions
Deep residual learning for image recognition
Deep learning for finance: Deep portfolios
Applied Stochastic Models in Business and Industry
Cryptocurrency portfolio management with deep reinforcement learning
Adam: A method for stochastic optimization
Cited by (38)
New reinforcement learning based on representation transfer for portfolio management
2024, Knowledge-Based SystemsGraphSAGE with deep reinforcement learning for financial portfolio optimization
2024, Expert Systems with ApplicationsA collective portfolio selection approach for investment clubs
2024, Information and ManagementAlgorithmic trading using continuous action space deep reinforcement learning[Formula presented]
2024, Expert Systems with ApplicationsSafety AARL: Weight adjustment for reinforcement-learning-based safety dynamic asset allocation strategies
2023, Expert Systems with ApplicationsFinancial applications of machine learning: A literature review
2023, Expert Systems with Applications