当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Embedding and learning with signatures
Computational Statistics & Data Analysis ( IF 1.8 ) Pub Date : 2021-05-01 , DOI: 10.1016/j.csda.2020.107148
Adeline Fermanian

Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. The present article is concerned with a novel approach for sequential learning, called the signature method, and rooted in rough path theory. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies critically on an embedding principle, which consists in representing discretely sampled data as paths, i.e., functions from [0,1] to R^d. After a survey of machine learning methodologies for signatures, we investigate the influence of embeddings on prediction accuracy with an in-depth study of three recent and challenging datasets. We show that a specific embedding, called lead-lag, is systematically better, whatever the dataset or algorithm used. Moreover, we emphasize through an empirical study that computing signatures over the whole path domain does not lead to a loss of local information. We conclude that, with a good embedding, the signature combined with a simple algorithm achieves results competitive with state-of-the-art, domain-specific approaches.

中文翻译:

使用签名进行嵌入和学习

顺序和时间数据出现在许多研究领域,例如定量金融、医学或计算机视觉。本文涉及一种新的顺序学习方法,称为签名方法,并植根于粗糙路径理论。它的基本原理是通过迭代积分的分级特征集来表示多维路径,称为签名。这种方法严重依赖于嵌入原则,该原则包括将离散采样数据表示为路径,即从 [0,1] 到 R^d 的函数。在对签名的机器学习方法进行调查之后,我们通过深入研究三个最近且具有挑战性的数据集来调查嵌入对预测准确性的影响。我们表明,称为超前滞后的特定嵌入在系统上更好,无论使用什么数据集或算法。此外,我们通过实证研究强调,在整个路径域上计算签名不会导致本地信息的丢失。我们得出的结论是,通过良好的嵌入,签名与简单的算法相结合,可以获得与最先进的特定领域方法相媲美的结果。
更新日期:2021-05-01
down
wechat
bug