Deep reinforcement learning for portfolio management of markets with a dynamic number of assets

doi:10.1016/j.eswa.2020.114002

Expert Systems with Applications

Volume 164, February 2021, 114002

https://doi.org/10.1016/j.eswa.2020.114002 Get rights and content

Highlights

•
Formulation of a trading method for markets with a dynamic number of assets.
•
Unseen assets are easily integrated without changing or retraining the network.
•
Optimal transactions are computed for markets with transaction costs.
•
The method outperforms the baselines on a cryptocurrency market database.

Abstract

This work proposes a novel portfolio management method using deep reinforcement learning on markets with a dynamic number of assets. This problem is especially important in cryptocurrency markets, which already support the trading of hundreds of assets with new ones being added every month. A novel neural network architecture is proposed, which is trained using deep reinforcement learning. Our architecture considers all assets in the market, and automatically adapts when new ones are suddenly introduced, making our method more general and sample-efficient than previous methods. Further, transaction cost minimization is considered when formulating the problem. For this purpose, a novel algorithm to compute optimal transactions given a desired portfolio is integrated into the architecture. The proposed method was tested on a dataset of one of the largest cryptocurrency markets in the world, outperforming state-of-the-art methods, achieving average daily returns of over 24%.

Introduction

Generating financial profits by trading cryptocurrencies is challenging due to their price erratic changes. Cryptocurrencies are decentralized electronic financial assets that appeared as an alternative to fiat currencies(Nakamoto, 2008). However, according toCorbet et al. (2014) the prices of cryptocurrencies are affected by government announcements, policies and actions, in spite of the fact they are decentralized assets. Additionally, cryptocurrency prices show higher volatility than those of traditional assets. For instance, in early 2017, the price of Bitcoin, the well-known cryptocurrency, reached its maximum historical peak of approximately 19,000 USD per unit, but during the subsequent months it plunged to 3000 USD, followed by a strong bounce to its current price of approximately 8,000 USD per unit. For this price behavior, formulating cryptocurrency trading strategies is a non-trivial task.

Reinforcement learning (RL) is a suitable framework to process complex data and handle difficult decision-making processes such as asset trading. A trading process can be naturally formulated as an RL process. In this type of process, an agent takes actions over an environment based on observations of the states of that environment; rewards are received by that agent as a consequence of both the states visited and the actions taken. In the specific case of asset trading, a state of the environment is equivalent to the recent history of the assets; actions are the transactions made to get rid of some of the assets held by the agent and acquire new ones, and the rewards are scalar functions of the earnings or losses seen by the agent for taking those actions. The vector containing the information of the assets held by an agent at any moment is called the portfolio, hence this type of process is also known as portfolio management.

Typically RL algorithms have fixed state and action spaces. However, new assets are often added to cryptocurrency markets(Narayanan et al., 2016). Hence, to rapidly incorporate those assets into the process, adaptable state and action spaces are necessary. Most works on automatic asset trading assume the number of assets is static(Bu and Cho, 2018, Jiang and Liang, 2017, Liang et al., 2018, Pendharkar and Cusatis, 2018). Thereby, convolutional layers of neural networks can extract useful information about the prices of that specific set of assets. But, by doing so, a large portion of data is wasted because only a small number of assets is used for training algorithms while datasets contain information collected from dozens and even hundreds of assets. This is an important issue, not only from a sample-efficiency point of view, but also because critical earnings may be accomplished by trading assets that are suddenly incorporated into a market. For instance, in Fig.1, Dock (coin) reached 2.2 times its original price during the first 20 days in the market, then it fell and settled at about 1.2 times its original price on the subsequent days. This behavior has been observed often in assets recently added to markets; however, we are confident it can be predicted and exploited.

This work proposes an RL method using a recurrent neural network (RNN) to perform portfolio management on markets in which the number of assets may change over time. Proximal Policy Optimization (PPO)(Schulman et al., 2017) was adapted for this purpose. PPO is a popular deep RL algorithm, with an actor–critic architecture, that has been shown to perform well on difficult tasks such as video-game playing and dexterous robotics control(OpenAI et al., 2020, Schulman et al., 2017). PPO has been recently applied to portfolio management in markets with a fixed number of assets(Liang et al., 2018). However, to adapt to a dynamic number of assets, we propose a particular architecture that processes assets individually and uses the current portfolio entries for weighting. This results in a network able to effectively process assets that were never even seen during training, without requiring extra training or memory. The proposed method was backtested using data of a cryptocurrency market along state-of-the art baselines in three different setups, which correspond to episodes with lengths of one day, 30 days and 16 weeks with holding periods of 30min, one day and one week, respectively. The performances of the methods were evaluated using two standard measures for investing and trading: total return and Sharpe ratio. Our method outperformed the baselines in all the tested setups.

Keeping the number of transactions as small as possible is an important issue to consider while doing asset trading. Markets obtain revenues from their services in the form of transaction costs. Any agent that buys or sells assets gives a small percentage of those transactions to the service provider. Cryptocurrency market transaction fees are typically lower than 1%, which are among the lowest compared to fees found in other types of financial asset markets. However, these seemingly negligible fees become important when transactions are made frequently, for instance in periods of minutes or hours. This is because the changes in the assets acquired by the agent may not compensate for the losses suffered by transaction costs. Hence, the algorithm should aim to keep the number of transactions low. To cope with this issue, in our design, the current portfolio vector is given to the network in the output layers, penalizing assets not held by the agent.

Additionally, a novel algorithm to compute the optimal transactions is given in this work. In a market where transaction costs exist, if an agent wants to obtain a portfolio vector satisfying certain specific proportions, the agent needs to perform transactions, thus giving up some amount for doing that.Ormos and Urbán (2013) proposed an iterative method to compute the values of the transactions needed to convert some portfolio into another with minimal cost. However, this method assumes transaction costs are the same for all assets, and this is not always the case. We propose a different approach to this problem in which this assumption is not needed. To do this, the problem of finding the optimal transactions given some desired portfolio proportions is converted into a linear program (LP). The main contributions of this work are:

•
Formulation of a trading system without the limitation of having a market with a fixed number of assets. Our method is sample-efficient, its implementation is straightforward, and during deployment, it is able to integrate assets into the process that appear suddenly in the market without the need of extra training.
•
Transaction costs are considered and managed in our work. A novel algorithm to compute the optimal transactions is proposed and integrated into the system.
•
Implementation of the proposed method using the dataset of a cryptocurrency market. The reliability of our method is tested under three different trading setups to show its adaptability.

The rest of this paper is organized as follows. Section2 presents related works in the field. Section3 describes the mathematical definition of the portfolio management problem. Section4 describes the proposed method. Section5 describes the transaction optimization process. Section6 explains the experiment setups and metrics used to evaluate them. In Section7, the results of the experiments are discussed. Finally, conclusions and directions for future work are presented in Section8.

Section snippets

Related works

Deep learning approaches for portfolio management can be divided into two groups: model-based and model-free methods. Model-based methods, as their name suggests, assume models of the asset behavior exist, and deep neural networks (DNNs) are used to approximate these models using supervised learning on price datasets. Model-based methods do not cope with asset trading directly; instead, they require secondary methods to process the predicted prices, which typically use conventional heuristics

Problem definition

This section introduces the concepts of market, investor and trading session, as well as the mathematical definition of portfolio management.

Proposed method

A trading session can be naturally formulated in the RL framework. In an RL process, an agent visits the states of an environment and takes actions in each visited state. In return, the environment gives rewards to the agent for taking those actions. After the agent executes an action, the environment evolves into a new state due to both the environmental dynamics and the action itself. An episode is the set of interactions between the agent and environment from its initialization until a final

Transaction optimization problem

The outputs of the policy network are the entries of the portfolio vector ${p_{i}}_{[t]}$ . In general ${p_{i}}_{[t]} \neq {p_{i}}_{[t - 1]}^{'}$ , consequently some assets have to be sold and others purchased to satisfy the desired portfolio vector. For simplicity, let us drop the time dependency of the expressions, since this is understood by context, for instance ${p_{i}}_{[t]}$ and ${p_{i}}_{[t - 1]}^{'}$ are written as $p_{i}$ and $p_{i}^{'}$ , respectively. Let us also represent the shares acquired and sold per asset at some period by the non-negative variables ${\hat{x}}_{i}$

Experiments

This section describes the setups of our experiments. These include dataset features, metrics for evaluating the algorithms and implementation details. We assume the amounts traded by our agents are sufficiently small that the prices of assets are not affected by these transactions, and the available shares in the market are large enough that transactions are executed immediately.

Results

In the first experiment, which corresponds to trading sessions of 1 day and holding periods of 30min, DNA-S obtained the highest results; these are shown in Table4. DNA-S obtained an average TR of 0.224, which is more than double the score of the closest competitor CNN, which obtained 0.088. DNA-R obtained the third best results with an average TR of 0.041. The other two approaches: DQN and TD( $λ$ ) obtained the lowest scores, with average TRs close to zero. The average SRs obtained by all the

Conclusions and future work

We introduced a method that performs portfolio management on markets with transaction costs in which the number of assets is dynamic. Even though our method is able to integrate new assets into the process during deployment, it does not require extra training or memory and its implementation is straightforward. Our method was tested on a cryptocurrency market outperforming state-of-the-art methods under three distinct setups. Additionally, a novel algorithm to compute transactions with minimal

CRediT authorship contribution statement

Carlos Betancourt: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Wen-Hui Chen: Resources, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (32)

AboussalahA.M. et al.
Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization
Expert Systems with Applications
(2020)
DempsterM.A. et al.
An automated fx trading system using adaptive reinforcement learning
Expert Systems with Applications
(2006)
JeongG. et al.
Improving financial trading decisions using deep q-learning: Predicting the number of shares, action strategies, and transfer learning
Expert Systems with Applications
(2019)
LeiK. et al.
Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading
Expert Systems with Applications
(2020)
ParkH. et al.
An intelligent financial portfolio trading strategy using deep q-learning
Expert Systems with Applications
(2020)
PendharkarP.C. et al.
Trading financial indices with reinforcement learning agents
Expert Systems with Applications
(2018)
BerentsenA. et al.
Stablecoins: The quest for a low-volatility cryptocurrency
BoydS. et al.
Convex Optimization
(2004)
BuS.-J. et al.
Learning optimal q-function using deep boltzmann machine for reliable trading of cryptocurrency
Cho, K., VanMerriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning...

CorbetS. et al.

The influence of central bank monetary policy announcements on cryptocurrency return volatility

DantzigG.B.

Linear programming and extensions

(1998)

HeK. et al.

Deep residual learning for image recognition

HeatonJ. et al.

Deep learning for finance: Deep portfolios

Applied Stochastic Models in Business and Industry

(2017)

JiangZ. et al.

Cryptocurrency portfolio management with deep reinforcement learning

KingmaD.P. et al.

Adam: A method for stochastic optimization

(2014)

Cited by (38)

New reinforcement learning based on representation transfer for portfolio management
2024, Knowledge-Based Systems
Portfolio management is an important financial task, which helps to activate the capital market and boost investor confidence by finding a long-term profitable policy. Reinforcement learning is one of the most prospective method for this task. However, the policy learnt by reinforcement learning lacks robustness, because current studies cannot overcome the distribution shift between the dynamics of the current environment and the future environment. To better tackle this issue, we proposed a method that can align the distributions of different environment dynamics in a pre-trained representation space, thereby enhancing the robustness of the optimal policy in future environments. The key insight of this method is to only extract shared representation between the current and the future, which is the high-level latent information that spans across sequences. This information exists everywhere in the historical sequence, so it can be assumed that it will not disappear in the near future, thus align the distribution of different environment dynamics in this representation space. Such representations are learnt by encoding the current and the future into representations and maximizing their mutual information using probabilistic contrastive loss. Experiments demonstrates the superior performance and universality of our method.
GraphSAGE with deep reinforcement learning for financial portfolio optimization
2024, Expert Systems with Applications
Portfolio optimization is an active management strategy that aims to maximize returns and control risk within reasonable limits. The Proximal Policy Optimization (PPO), a robust on-policy actor-critic deep reinforcement learning (DRL) model, is gaining popularity in portfolio optimization because it can help reduce emotional biases and take systematic investment actions. However, some research has found that the PPO model cannot achieve such remarkable performance in portfolio optimization as in games or robot control. In this paper, a novel GraphSAGE and DRL coupled model (GRL) is proposed to improve the architecture of the PPO agent by introducing a GraphSAGE-based feature extractor to capture the complex non-Euclidean relationships among market indexes, industry indexes and stocks. In addition, the explainable model SHAP is used to select a few but important features for GRL learning, and a method for generating a static financial graph is defined. This improves the robustness and training efficiency of the GRL model. We provide a holistic performance evaluation for GRL on three datasets using five metrics, i.e., Return On Investment (ROI), Sharpe Ratio, Sortino Ratio, Maximum Drawdown, and Calmar Ratio. The results show that the GRL model outperforms the Equal Weight strategy and the $S & P$ 500 index. In addition, the results of the comparative analysis show that the Share-Extractor GRL and the Separate-Extractor GRL significantly outperform the PPO baseline without a feature extractor. This implies that integrating a GraphSAGE-based feature extractor into the PPO agent can improve its performance and robustness in portfolio optimization tasks.
A collective portfolio selection approach for investment clubs
2024, Information and Management
Recently, with the popularity of social investing platforms, participating in an investment club has become a good choice for investors. Following financial experts in the investment club likely generates more profit as they have higher expertise in planning an investment portfolio. In this study, we propose a portfolio selection mechanism that combines collective intelligence extracted from investors’ opinions and LSTM stock price predictions to infer a club's investment preference and predict the profitability of the extracted investment targets. Based on a club's risk tolerance and investment preference, the proposed mechanism can create an appropriate stock portfolio for the investors in the club. Utilizing StockTwits and stock historical data, the experimental results verify that the proposed portfolio selection mechanism performs better than market indices and other benchmark approaches in the market.
Algorithmic trading using continuous action space deep reinforcement learning[Formula presented]
2024, Expert Systems with Applications
Finding a more efficient trading strategy has always been one of the main concerns in financial market trading. In order to create trading strategies that lead to higher profits, historical data must be used. Due to a large amount of financial data and various factors affecting them, algorithmic trading and, more recently, artificial intelligence are employed to overcome the decision-making complexity. This paper aims to introduce a new approach using Twin-Delayed DDPG (TD3) and the daily close price to create a trading strategy. As a continuous action space deep reinforcement learning algorithm, in contrast to the discrete ones, the TD3 provides us with both the number of trading shares and the trading positions. In order to evaluate the performance of the proposed algorithm, the comparison results of our approach and other commonly-used algorithms such as technical analysis, reinforcement learning, supervised learning, stochastic strategies, and deterministic strategies are reported. By employing both position and the number of trading shares, we show that the performance of a trading strategy can be improved in terms of Return and Sharpe ratio.
Safety AARL: Weight adjustment for reinforcement-learning-based safety dynamic asset allocation strategies
2023, Expert Systems with Applications
Dynamic asset allocation involves adjusting asset weights based on performance to reduce risks according to market conditions. This paper proposes a novel framework called safety asset allocation reinforcement learning (AARL), which combines multiple protective dynamic asset allocation strategies (PDAS) to minimize risks through reinforcement learning (RL). The framework integrates eight validated PDAS that maintain their unique characteristics. To implement the proposed framework, a novel RL environment and agent were developed to invest in PDAS using three high-probability actions. The agent receives state and reward information from the environment and is trained using the proximal policy optimization algorithm to output weights for investing. The actions are PDAS, which facilitates for diversified investments and the learning of various combinations that are suitable for market conditions. Three experiments were conducted to verify the effectiveness of the proposed Safety AARL framework. The first experiment demonstrates the validity of adjusting weights using the framework. The second experiment demonstrates that the framework outperforms existing techniques during global market decline. The third experiment demonstrates that risk is minimized, outperforming the latest methods for asset allocation. Overall, the results demonstrate that the Safety AARL framework outperforms existing techniques by minimizing risks and maintaining stable returns by combining the advantages of multiple PDAS.
Financial applications of machine learning: A literature review
2023, Expert Systems with Applications
This systematic literature review analyses the recent advances of machine learning and deep learning in finance. The study considers six financial domains: stock markets, portfolio management, cryptocurrency, forex markets, financial crisis, bankruptcy and insolvency. We provide an overview of previously proposed techniques in these areas by examining 126 selected articles across 44 reputed journals. The main contributions of this review include an extensive examination of data characteristics and features used for model training, evaluation of validation approaches, and model performance addressing each financial problem. A systematic literature review methodology, PRISMA, is used to carry out this comprehensive review. The study also analyses bibliometric information to understand the current status of research focused on machine learning in finance. The study finally points out possible research directions which might lead to new inquiries in machine learning and finance.

View all citing articles on Scopus

View full text

Deep reinforcement learning for portfolio management of markets with a dynamic number of assets

Highlights

Abstract

Introduction

Section snippets

Related works

Problem definition

Proposed method

Transaction optimization problem

Experiments

Results

Conclusions and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Stablecoins: The quest for a low-volatility cryptocurrency

Convex Optimization

Learning optimal q-function using deep boltzmann machine for reliable trading of cryptocurrency

The influence of central bank monetary policy announcements on cryptocurrency return volatility

Linear programming and extensions

Deep residual learning for image recognition

Deep learning for finance: Deep portfolios

Applied Stochastic Models in Business and Industry

Cryptocurrency portfolio management with deep reinforcement learning

Adam: A method for stochastic optimization