Easy Reverse Mode Automatic Differentiation in C#

栏目: IT技术 · 发布时间: 4年前

内容简介:Continuing from myAs explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading , this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important.

Why Reverse Mode Automatic Differentiation?

As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary!

Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type:

<strong>R</strong> → <strong>R</strong><sup>N</sup>

That is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario:

<strong>R</strong><sup>N</sup> → <strong>R</strong>

That is, functions of many parameters that return a single real number. A lot of problems are better suited to reverse mode AD, and some modern machine learning frameworks now employ reverse mode AD internally (thousands of parameters, single output that's compared to a goal).

How does Reverse Mode Work?

The identities I described in the other article still apply since they're simply the chain rule , but reverse mode computes derivatives backwards . Forward-mode AD is easy to implement using dual numbers in which the evaluation order matches C#'s normal evaluation order: just compute a second number corresponding to the derivative along side the normal computation. Since reverse mode runs backwards, we have to do the computational dual: build a (restricted) continuation!

You can see a rough sketch of both forward mode and reverse mode here . Forward mode AD using dual numbers will look something like this:

public readonly struct Fwd
{
    public readonly double Magnitude;
    public readonly double Derivative;

    public Fwd(double mag, double deriv)
    {
        this.Magnitude = mag;
        this.Derivative = deriv;
    }

    public Fwd Pow(int k) =>
        new Fwd(Math.Pow(Magnitude, k), k * Math.Pow(Magnitude, k - 1) * Derivative);

    public static Fwd operator +(Fwd lhs, Fwd rhs) =>
        new Fwd(lhs.Magnitude + rhs.Magnitude, lhs.Derivative + rhs.Derivative);

    public static Fwd operator *(Fwd lhs, Fwd rhs) =>
        new Fwd(lhs.Magnitude + rhs.Magnitude, lhs.Derivative * rhs.Magnitude + rhs.Derivative * lhs.Magnitude);

    public static Func<double, Fwd> Differentiate(Func<Fwd, Fwd> f) =>
        x => f(new Fwd(x, 1));

    public static Func<double, double, Fwd> DifferentiateX0(Func<Fwd, Fwd, Fwd> f) =>
        (x0, x1) => f(new Fwd(x0, 1), new Fwd(x1, 0));

    public static Func<double, double, Fwd> DifferentiateX1(Func<Fwd, Fwd, Fwd> f) =>
        (x0, x1) => f(new Fwd(x0, 0), new Fwd(x1, 1));
}

Translating this into reverse mode entails replacing Fwd.Derivative with a continuation like so:

public readonly struct Rev
{
    public readonly double Magnitude;
    readonly Action<double> Derivative;

    public Rev(double y, Action<double> dy)
    {
        this.Magnitude = y;
        this.Derivative = dy;
    }

    public Rev Pow(int e)
    {
        var x = Magnitude;
        var k = Derivative;
        return new Rev(Math.Pow(Magnitude, e), dx => k(e * Math.Pow(x, e - 1) * dx));
    }

    public static Rev operator +(Rev lhs, Rev rhs) =>
        new Rev(lhs.Magnitude + rhs.Magnitude, dx =>
        {
            lhs.Derivative(dx);
            rhs.Derivative(dx);
        });

    public static Rev operator *(Rev lhs, Rev rhs) =>
        new Rev(lhs.Magnitude * rhs.Magnitude,
                dx =>
                {
                    lhs.Derivative(dx * rhs.Magnitude);
                    rhs.Derivative(dx * lhs.Magnitude);
                });

    public static Func<double, (double, double)> Differentiate(Func<Rev, Rev> f) =>
        x =>
        {
            double dx = 1;
            var y = f(new Rev(x, dy => dx = dy));
            y.Derivative(1);
            return (y.Magnitude, dx);
        };

    public static Func<double, double, (double, double, double)> Differentiate(Func<Rev, Rev, Rev> f) =>
        (x0, x1) =>
        {
            double dx0 = 1, dx1 = 1;
            var y = f(new Rev(x0, dy => dx0 = dy), new Rev(x1, dy => dx1 = dy));
            y.Derivative(1);
            return (y.Magnitude, dx0, dx1);
        };

    public static Func<double, double, double, (double, double, double, double)> Differentiate(Func<Rev, Rev, Rev, Rev> f) =>
        (x0, x1, x2) =>
        {
            double dx0 = -1, dx1 = -1, dx2 = -1;
            var y = f(new Rev(x0, dy => dx0 = dy),
                      new Rev(x1, dy => dx1 = dy),
                      new Rev(x2, dy => dx2 = dy));
            y.Derivative(1);
            return (y.Magnitude, dx0, dx1, dx2);
        };
}

As I mentioned in my last post, my goal here isn't the most efficient implementation for reverse mode AD, but to distill its essence to make it direct and understandable. This representation builds a whole new continuation on every invocation of the function being differentiated. More efficient representations would only compute this continuation once for any number of invocations, and there are plenty of other optimizations that can be applied to both forward and reverse mode representations.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

重新定义团队:谷歌如何工作

重新定义团队:谷歌如何工作

拉兹洛·博克 / 宋伟 / 中信出版集团 / 2015-12-1 / CNY 56.00

谷歌首席人才官拉斯洛•博克权威力作,谷歌公开认可的谷歌高层作品,首度揭秘谷歌颠覆工业时代模式的人才和团队管理的核心法则,《纽约时报》畅销榜第一名,Business Insider 2015最佳商业书籍,谷歌的创造力就在于此! 编辑推荐! 1、 谷歌人才官首次公开谷歌人才和团队管理的核心秘籍 在谷歌执掌人事多年的拉斯洛•博克是人才和团队管理的顶级专家。他加入谷歌后,谷歌的员工数从六......一起来看看 《重新定义团队:谷歌如何工作》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换