Deep reinforcement learning for portfolio management of markets with a dynamic number of assets

https://doi.org/10.1016/j.eswa.2020.114002Get rights and content

Highlights

  • Formulation of a trading method for markets with a dynamic number of assets.

  • Unseen assets are easily integrated without changing or retraining the network.

  • Optimal transactions are computed for markets with transaction costs.

  • The method outperforms the baselines on a cryptocurrency market database.

Abstract

This work proposes a novel portfolio management method using deep reinforcement learning on markets with a dynamic number of assets. This problem is especially important in cryptocurrency markets, which already support the trading of hundreds of assets with new ones being added every month. A novel neural network architecture is proposed, which is trained using deep reinforcement learning. Our architecture considers all assets in the market, and automatically adapts when new ones are suddenly introduced, making our method more general and sample-efficient than previous methods. Further, transaction cost minimization is considered when formulating the problem. For this purpose, a novel algorithm to compute optimal transactions given a desired portfolio is integrated into the architecture. The proposed method was tested on a dataset of one of the largest cryptocurrency markets in the world, outperforming state-of-the-art methods, achieving average daily returns of over 24%.

Introduction

Generating financial profits by trading cryptocurrencies is challenging due to their price erratic changes. Cryptocurrencies are decentralized electronic financial assets that appeared as an alternative to fiat currencies(Nakamoto, 2008). However, according toCorbet et al. (2014) the prices of cryptocurrencies are affected by government announcements, policies and actions, in spite of the fact they are decentralized assets. Additionally, cryptocurrency prices show higher volatility than those of traditional assets. For instance, in early 2017, the price of Bitcoin, the well-known cryptocurrency, reached its maximum historical peak of approximately 19,000 USD per unit, but during the subsequent months it plunged to 3000 USD, followed by a strong bounce to its current price of approximately 8,000 USD per unit. For this price behavior, formulating cryptocurrency trading strategies is a non-trivial task.

Reinforcement learning (RL) is a suitable framework to process complex data and handle difficult decision-making processes such as asset trading. A trading process can be naturally formulated as an RL process. In this type of process, an agent takes actions over an environment based on observations of the states of that environment; rewards are received by that agent as a consequence of both the states visited and the actions taken. In the specific case of asset trading, a state of the environment is equivalent to the recent history of the assets; actions are the transactions made to get rid of some of the assets held by the agent and acquire new ones, and the rewards are scalar functions of the earnings or losses seen by the agent for taking those actions. The vector containing the information of the assets held by an agent at any moment is called the portfolio, hence this type of process is also known as portfolio management.

Typically RL algorithms have fixed state and action spaces. However, new assets are often added to cryptocurrency markets(Narayanan et al., 2016). Hence, to rapidly incorporate those assets into the process, adaptable state and action spaces are necessary. Most works on automatic asset trading assume the number of assets is static(Bu and Cho, 2018, Jiang and Liang, 2017, Liang et al., 2018, Pendharkar and Cusatis, 2018). Thereby, convolutional layers of neural networks can extract useful information about the prices of that specific set of assets. But, by doing so, a large portion of data is wasted because only a small number of assets is used for training algorithms while datasets contain information collected from dozens and even hundreds of assets. This is an important issue, not only from a sample-efficiency point of view, but also because critical earnings may be accomplished by trading assets that are suddenly incorporated into a market. For instance, in Fig.1, Dock (coin) reached 2.2 times its original price during the first 20 days in the market, then it fell and settled at about 1.2 times its original price on the subsequent days. This behavior has been observed often in assets recently added to markets; however, we are confident it can be predicted and exploited.

This work proposes an RL method using a recurrent neural network (RNN) to perform portfolio management on markets in which the number of assets may change over time. Proximal Policy Optimization (PPO)(Schulman et al., 2017) was adapted for this purpose. PPO is a popular deep RL algorithm, with an actor–critic architecture, that has been shown to perform well on difficult tasks such as video-game playing and dexterous robotics control(OpenAI et al., 2020, Schulman et al., 2017). PPO has been recently applied to portfolio management in markets with a fixed number of assets(Liang et al., 2018). However, to adapt to a dynamic number of assets, we propose a particular architecture that processes assets individually and uses the current portfolio entries for weighting. This results in a network able to effectively process assets that were never even seen during training, without requiring extra training or memory. The proposed method was backtested using data of a cryptocurrency market along state-of-the art baselines in three different setups, which correspond to episodes with lengths of one day, 30 days and 16 weeks with holding periods of 30min, one day and one week, respectively. The performances of the methods were evaluated using two standard measures for investing and trading: total return and Sharpe ratio. Our method outperformed the baselines in all the tested setups.

Keeping the number of transactions as small as possible is an important issue to consider while doing asset trading. Markets obtain revenues from their services in the form of transaction costs. Any agent that buys or sells assets gives a small percentage of those transactions to the service provider. Cryptocurrency market transaction fees are typically lower than 1%, which are among the lowest compared to fees found in other types of financial asset markets. However, these seemingly negligible fees become important when transactions are made frequently, for instance in periods of minutes or hours. This is because the changes in the assets acquired by the agent may not compensate for the losses suffered by transaction costs. Hence, the algorithm should aim to keep the number of transactions low. To cope with this issue, in our design, the current portfolio vector is given to the network in the output layers, penalizing assets not held by the agent.

Additionally, a novel algorithm to compute the optimal transactions is given in this work. In a market where transaction costs exist, if an agent wants to obtain a portfolio vector satisfying certain specific proportions, the agent needs to perform transactions, thus giving up some amount for doing that.Ormos and Urbán (2013) proposed an iterative method to compute the values of the transactions needed to convert some portfolio into another with minimal cost. However, this method assumes transaction costs are the same for all assets, and this is not always the case. We propose a different approach to this problem in which this assumption is not needed. To do this, the problem of finding the optimal transactions given some desired portfolio proportions is converted into a linear program (LP). The main contributions of this work are:

  • Formulation of a trading system without the limitation of having a market with a fixed number of assets. Our method is sample-efficient, its implementation is straightforward, and during deployment, it is able to integrate assets into the process that appear suddenly in the market without the need of extra training.

  • Transaction costs are considered and managed in our work. A novel algorithm to compute the optimal transactions is proposed and integrated into the system.

  • Implementation of the proposed method using the dataset of a cryptocurrency market. The reliability of our method is tested under three different trading setups to show its adaptability.

The rest of this paper is organized as follows. Section2 presents related works in the field. Section3 describes the mathematical definition of the portfolio management problem. Section4 describes the proposed method. Section5 describes the transaction optimization process. Section6 explains the experiment setups and metrics used to evaluate them. In Section7, the results of the experiments are discussed. Finally, conclusions and directions for future work are presented in Section8.

Section snippets

Related works

Deep learning approaches for portfolio management can be divided into two groups: model-based and model-free methods. Model-based methods, as their name suggests, assume models of the asset behavior exist, and deep neural networks (DNNs) are used to approximate these models using supervised learning on price datasets. Model-based methods do not cope with asset trading directly; instead, they require secondary methods to process the predicted prices, which typically use conventional heuristics

Problem definition

This section introduces the concepts of market, investor and trading session, as well as the mathematical definition of portfolio management.

Proposed method

A trading session can be naturally formulated in the RL framework. In an RL process, an agent visits the states of an environment and takes actions in each visited state. In return, the environment gives rewards to the agent for taking those actions. After the agent executes an action, the environment evolves into a new state due to both the environmental dynamics and the action itself. An episode is the set of interactions between the agent and environment from its initialization until a final

Transaction optimization problem

The outputs of the policy network are the entries of the portfolio vector pi[t]. In general pi[t]pi[t1], consequently some assets have to be sold and others purchased to satisfy the desired portfolio vector. For simplicity, let us drop the time dependency of the expressions, since this is understood by context, for instance pi[t] and pi[t1] are written as pi and pi, respectively. Let us also represent the shares acquired and sold per asset at some period by the non-negative variables xˆi

Experiments

This section describes the setups of our experiments. These include dataset features, metrics for evaluating the algorithms and implementation details. We assume the amounts traded by our agents are sufficiently small that the prices of assets are not affected by these transactions, and the available shares in the market are large enough that transactions are executed immediately.

Results

In the first experiment, which corresponds to trading sessions of 1 day and holding periods of 30min, DNA-S obtained the highest results; these are shown in Table4. DNA-S obtained an average TR of 0.224, which is more than double the score of the closest competitor CNN, which obtained 0.088. DNA-R obtained the third best results with an average TR of 0.041. The other two approaches: DQN and TD(λ) obtained the lowest scores, with average TRs close to zero. The average SRs obtained by all the

Conclusions and future work

We introduced a method that performs portfolio management on markets with transaction costs in which the number of assets is dynamic. Even though our method is able to integrate new assets into the process during deployment, it does not require extra training or memory and its implementation is straightforward. Our method was tested on a cryptocurrency market outperforming state-of-the-art methods under three distinct setups. Additionally, a novel algorithm to compute transactions with minimal

CRediT authorship contribution statement

Carlos Betancourt: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Wen-Hui Chen: Resources, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (32)

  • CorbetS. et al.

    The influence of central bank monetary policy announcements on cryptocurrency return volatility

  • DantzigG.B.

    Linear programming and extensions

    (1998)
  • HeK. et al.

    Deep residual learning for image recognition

  • HeatonJ. et al.

    Deep learning for finance: Deep portfolios

    Applied Stochastic Models in Business and Industry

    (2017)
  • JiangZ. et al.

    Cryptocurrency portfolio management with deep reinforcement learning

  • KingmaD.P. et al.

    Adam: A method for stochastic optimization

    (2014)
  • Cited by (38)

    View all citing articles on Scopus
    View full text