DeepScalper：一种风险感知的加强学习框架，以捕获短暂的盘中交易机会

论文标题

DeepScalper：一种风险感知的加强学习框架，以捕获短暂的盘中交易机会

DeepScalper: A Risk-Aware Reinforcement Learning Framework to Capture Fleeting Intraday Trading Opportunities

论文作者

Sun, Shuo, Xue, Wanqi, Wang, Rundong, He, Xu, Zhu, Junlei, Li, Jian, An, Bo

论文摘要

强化学习（RL）技术在许多具有挑战性的定量交易任务（例如投资组合管理和算法交易）中取得了巨大的成功。尤其是，由于金融市场的盘中行为反映了数十亿个迅速波动的首都，所以盘中交易是最有利可图和有风险的任务之一。但是，绝大多数现有的RL方法都集中在相对较低的频率交易方案（例如日级）上，并且由于两个主要挑战而无法捕获短暂的内盘投资机会：1）如何有效地培训有利可图的RL RL投资代理，以供日内投资决策，涉及高衰减的高级良好的细化的精美的良好的良好的良好的良好的动作空间； 2）如何学习有意义的多模式市场表示，以了解tick级金融市场的盘中行为。由专业人类盘中交易者的有效工作流程的促进，我们提出了DeepScalper，这是一个深入的加强学习框架，用于解决上述挑战。具体而言，DeepScalper包括四个组成部分：1）针对行动分支的决斗Q-Network，以应对日内交易的较大动作空间，以进行有效的RL优化； 2）具有事后奖励的新型奖励功能，以鼓励RL代理商在整个交易日的长期范围内做出交易决策； 3）一种编码器架构，用于学习多模式的临时市场嵌入，其中包括宏观和微观市场信息； 4）在最大化利润和最大程度地降低风险之间保持惊人平衡的风险意识辅助任务。通过对六个金融期货的三年来对现实世界中数据的广泛实验，我们证明，DeepScalper在四个财务标准方面显着优于许多最先进的基线。

Reinforcement learning (RL) techniques have shown great success in many challenging quantitative trading tasks, such as portfolio management and algorithmic trading. Especially, intraday trading is one of the most profitable and risky tasks because of the intraday behaviors of the financial market that reflect billions of rapidly fluctuating capitals. However, a vast majority of existing RL methods focus on the relatively low frequency trading scenarios (e.g., day-level) and fail to capture the fleeting intraday investment opportunities due to two major challenges: 1) how to effectively train profitable RL agents for intraday investment decision-making, which involves high-dimensional fine-grained action space; 2) how to learn meaningful multi-modality market representation to understand the intraday behaviors of the financial market at tick-level. Motivated by the efficient workflow of professional human intraday traders, we propose DeepScalper, a deep reinforcement learning framework for intraday trading to tackle the above challenges. Specifically, DeepScalper includes four components: 1) a dueling Q-network with action branching to deal with the large action space of intraday trading for efficient RL optimization; 2) a novel reward function with a hindsight bonus to encourage RL agents making trading decisions with a long-term horizon of the entire trading day; 3) an encoder-decoder architecture to learn multi-modality temporal market embedding, which incorporates both macro-level and micro-level market information; 4) a risk-aware auxiliary task to maintain a striking balance between maximizing profit and minimizing risk. Through extensive experiments on real-world market data spanning over three years on six financial futures, we demonstrate that DeepScalper significantly outperforms many state-of-the-art baselines in terms of four financial criteria.

下载PDF全文

下载文献需遵守相关版权规定

论文标题