当前位置: X-MOL 学术The Journal of Derivatives › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds
The Journal of Derivatives ( IF 0.4 ) Pub Date : 2020-04-25 , DOI: 10.3905/jod.2020.1.108
Igor Halperin

This article presents a discrete-time option pricing model that is rooted in reinforcement learning (RL), and more specifically in the famous Q-Learning method of RL. We construct a risk-adjusted Markov Decision Process for a discrete-time version of the classical Black-Scholes-Merton (BSM) model, where the option price is an optimal Q-function, while the optimal hedge is a second argument of this optimal Q-function, so that both the price and hedge are parts of the same formula. Pricing is done by learning to dynamically optimize risk-adjusted returns for an option replicating portfolio, as in Markowitz portfolio theory. Using Q-Learning and related methods, once created in a parametric setting, the model can go model-free and learn to price and hedge an option directly from data, without an explicit model of the world. This suggests that RL may provide efficient data-driven and model-free methods for the optimal pricing and hedging of options. Once we depart from the academic continuous-time limit, and vice versa, option pricing methods developed in Mathematical Finance may be viewed as special cases of model-based reinforcement learning. Further, due to the simplicity and tractability of our model, which only needs basic linear algebra (plus Monte Carlo simulation, if we work with synthetic data), and its close relationship to the original BSM model, we suggest that our model could be used in the benchmarking of different RL algorithms for financial trading applications. TOPICS: Derivatives, options Key Findings • Reinforcement learning (RL) is the most natural way for pricing and hedging of options that relies directly on data and not on a specific model of asset pricing. • The discrete-time RL approach to option pricing generalizes classical continuous-time methods; enables tracking mis-hedging risk, which disappears in the formal continuous-time limit; and provides a consistent framework for using options for both hedging and speculation. • A simple quadratic reward function, which presents a minimal extension of the classical Black-Scholes framework when combined with the Q-learning method of RL, gives rise to a particularly simple computational scheme where option pricing and hedging are semianalytical, as they amount to multiple uses of a conventional least-squares regression.

中文翻译:

QLBS:Black-Scholes(-Merton)世界中的Q-Learner

本文提出了一种离散时间期权定价模型,该模型基于增强学习(RL),更具体地说是基于著名的RL Q学习方法。我们为离散Black-Scholes-Merton(BSM)模型的离散时间版本构建了风险调整的马尔可夫决策过程,其中期权价格是最优Q函数,而最优套期保值是该最优选择的第二个参数Q函数,因此价格和对冲都是同一公式的一部分。如Markowitz投资组合理论所述,定价是通过学习动态优化期权复制投资组合的经风险调整的收益来完成的。使用Q-Learning和相关方法(一旦在参数设置中创建),该模型就可以免于模型化,并直接从数据中学习对价格进行期权定价和对冲,而无需建立明确的模型。这表明RL可以提供有效的数据驱动和无模型的方法,以实现最优的期权定价和对冲。一旦我们偏离了学术连续时间限制,反之亦然,在数学金融中开发的期权定价方法可能会被视为基于模型的强化学习的特殊情况。此外,由于我们模型的简单性和可扩展性(仅需要基本的线性代数(如果使用合成数据,则需要进行蒙特卡罗模拟),以及与原始BSM模型的紧密关系),我们建议可以使用我们的模型在针对金融交易应用的各种RL算法的基准测试中。主题:衍生物,options关键发现•强化学习(RL)是最直接根据数据而非特定资产定价模型进行期权定价和对冲的最自然方法。•离散时间RL期权定价方法推广了传统的连续时间方法;能够追踪套期保值风险,该风险在正式的连续期限内消失;并提供了使用对冲和投机选择的一致框架。•一个简单的二次奖励函数,与RL的Q学习方法结合使用时,对经典Black-Scholes框架进行了最小扩展,从而产生了一种特别简单的计算方案,其中期权定价和对冲为半分析式,因为它们等于传统最小二乘回归的多次使用。•离散时间RL期权定价方法推广了传统的连续时间方法;能够追踪套期保值风险,该风险在正式的连续期限内消失;并提供了使用对冲和投机选择的一致框架。•一个简单的二次奖励函数,与RL的Q学习方法结合使用时,对经典Black-Scholes框架进行了最小扩展,从而产生了一种特别简单的计算方案,其中期权定价和对冲为半分析式,因为它们等于传统最小二乘回归的多次使用。•离散时间RL期权定价方法推广了传统的连续时间方法;能够追踪套期保值风险,该风险在正式的连续期限内消失;并提供了使用对冲和投机选择的一致框架。•一个简单的二次奖励函数,与RL的Q学习方法结合使用时,对经典Black-Scholes框架进行了最小扩展,从而产生了一种特别简单的计算方案,其中期权定价和对冲为半分析式,因为它们等于传统最小二乘回归的多次使用。
更新日期:2020-04-25
down
wechat
bug