Perturbation confusion in forward automatic differentiation of higher-order functions,Journal of Functional Programming

当前位置： X-MOL 学术 › J. Funct. Program. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Perturbation confusion in forward automatic differentiation of higher-order functions
Journal of Functional Programming ( IF 1.1 ) Pub Date : 2019-09-16 , DOI: 10.1017/s095679681900008x
OLEKSANDR MANZYUK , BARAK A. PEARLMUTTER , ALEXEY ANDREYEVICH RADUL , DAVID R. RUSH , JEFFREY MARK SISKIND

Automatic differentiation (AD) is a technique for augmenting computer programs to compute derivatives. The essence of AD in its forward accumulation mode is to attach perturbations to each number, and propagate these through the computation by overloading the arithmetic operators. When derivatives are nested, the distinct derivative calculations, and their associated perturbations, must be distinguished. This is typically accomplished by creating a unique tag for each derivative calculation and tagging the perturbations. We exhibit a subtle bug, present in fielded implementations which support derivatives of higher-order functions, in which perturbations are confuseddespitethe tagging machinery, leading to incorrect results. The essence of the bug is as follows: a unique tag is needed for each derivative calculation, but in existing implementations unique tags are created when taking the derivative of a function at a point. When taking derivatives of higher-order functions, these need not correspond! We exhibit a simple example: a higher-order functionfwhose derivative at a pointx, namelyf′(x), is itself a function which calculates a derivative. This situation arises naturally when taking derivatives of curried functions. Two potential solutions are presented, and their deficiencies discussed. One uses eta expansion to delay the creation of fresh tags in order to put them into one-to-one correspondence with derivative calculations. The other wraps outputs of derivative operators with tag substitution machinery. Both solutions seem very difficult to implement without violating the desirable complexity guarantees of forward AD.

中文翻译：

高阶函数前向自动微分中的扰动混淆

自动微分 (AD) 是一种增强计算机程序以计算导数的技术。AD 在其前向累加模式中的本质是将扰动附加到每个数字上，并通过重载算术运算符在计算中传播这些扰动。当导数嵌套时，必须区分不同的导数计算及其相关的扰动。这通常通过为每个导数计算创建唯一标签并标记扰动来完成。我们展示了一个微妙的错误，存在于支持高阶函数的导数的现场实现中，其中扰动被混淆了尽管标记机制，导致不正确的结果。该漏洞的本质如下：每个导数计算都需要一个唯一的标签，但是在现有的实现中，在对某个点取函数的导数时会创建一个唯一的标签。当取高阶函数的导数时，这些不需要对应！我们展示一个简单的例子：一个高阶函数F在某一点上的导数X，即F'(X)，它本身就是一个计算导数的函数。这种情况在对柯里化函数求导时很自然地出现。提出了两种可能的解决方案，并讨论了它们的不足之处。一种是使用 eta 扩展来延迟新标签的创建，以便将它们与导数计算一一对应。另一个使用标签替换机制包装衍生运算符的输出。在不违反前向 AD 的理想复杂性保证的情况下，这两种解决方案似乎都很难实现。

更新日期：2019-09-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>