The Cost of Indirection

栏目: IT技术 · 发布时间: 4年前

内容简介:Providing zero cost abstractions is a goal of the Rust programming language. In this post we’ll explore the perforamance costs of various methods of indirection.I wanted a benchmark which emphasized the cost of indirection, so I needed to minimize the amou

Providing zero cost abstractions is a goal of the Rust programming language. In this post we’ll explore the perforamance costs of various methods of indirection.

The benchmark

I wanted a benchmark which emphasized the cost of indirection, so I needed to minimize the amount of work done in any single call of a function. I settled on solving the nth odd number problem (a play on the nth prime problem), using a shockingly inefficient algorithm.

use std::time::Instant;

fn main() {
    let start_time = Instant::now();
    let result = find_nth_odd(1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}

fn find_nth_odd(n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

fn is_odd(n: u64) -> bool {
    n % 2 == 1
}

This finds the billionth odd number in 1042 ms. All performance numbers reported in this post are the average of 5 runs, using the 1.41 compiler.

Struct impl

The simplest indirection is moving the is_odd function into a struct impl block.

use std::time::Instant;

fn main() {
    let finder = OddFinder;

    let start_time = Instant::now();
    let result = find_nth_odd(finder, 1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}

fn find_nth_odd(odd_finder: OddFinder, n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if odd_finder.is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

struct OddFinder;

impl OddFinder {
    fn is_odd(&self, n: u64) -> bool {
        n % 2 == 1
    }
}

This is a great demonstration of Rust’s zero costs abstractions, finding the billionth odd number in 1032 ms.

Trait

What happens if the is_odd method is part of a trait, rather than directly on the struct?

use std::time::Instant;

fn main() {
    let finder = OddFinderOne;

    let start_time = Instant::now();
    let result = find_nth_odd(finder, 1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}

fn find_nth_odd(odd_finder: OddFinderOne, n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if odd_finder.is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

trait OddFinder {
    fn is_odd(&self, n: u64) -> bool;
}

struct OddFinderOne;

impl OddFinder for OddFinderOne {
    fn is_odd(&self, n: u64) -> bool {
        n % 2 == 1
    }
}

Surprisingly moving this method behind a trait is not a zero cost abstraction, even though the find_nth_odd function still takes the specific OddFinderOne implementation. This version finds the billionth prime in 1340 ms.

Generics

If multiple structs exist which are capable of finding odd numbers, we can use generics to allow our find_nth_odd method to use any of them.

use std::time::Instant;

fn main() {
    let finder = OddFinderOne;

    let start_time = Instant::now();
    let result = find_nth_odd(finder, 1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}

fn find_nth_odd<T: OddFinder>(odd_finder: T, n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if odd_finder.is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

trait OddFinder {
    fn is_odd(&self, n: u64) -> bool;
}

struct OddFinderOne;

impl OddFinder for OddFinderOne {
    fn is_odd(&self, n: u64) -> bool {
        n % 2 == 1
    }
}

struct OddFinderTwo;

impl OddFinder for OddFinderTwo {
    fn is_odd(&self, _n: u64) -> bool {
        unimplemented!()
    }
}

This is another positive example of zero cost abstractions. It runs at essentially the same speed as the initial non-generic trait-based version of the program (above), finding the billionth prime in 1322 ms.

Trait Object

Rather than making our find_nth_odd function generic, an alternate approach to allow it to accept any OddFinder implementation is to use trait objects.

use std::time::Instant;

fn main() {
    let finder = OddFinderOne;

    let start_time = Instant::now();
    let result = find_nth_odd(&finder, 1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}


fn find_nth_odd(odd_finder: & dyn OddFinder, n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if odd_finder.is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

trait OddFinder {
    fn is_odd(&self, n: u64) -> bool;
}

struct OddFinderOne;

impl OddFinder for OddFinderOne {
    fn is_odd(&self, n: u64) -> bool {
        n % 2 == 1
    }
}

struct OddFinderTwo;

impl OddFinder for OddFinderTwo {
    fn is_odd(&self, _n: u64) -> bool {
        unimplemented!()
    }
}

I expected this to perform worse than the generic implementation, since trait objects should require dynamic dispatch (the program will have to look up the location of the correct is_odd implementation at run time), but it seems the compiler was able to see that finder is always OddFinderOne and perform some additional optimizations, because this calculated the billionth odd number in 1039 ms (average of 5 runs). This is not only better than the generic implementation but it is actually on par with the baseline, which was a very surprising outcome to me.

Boxed Trait Object

The other way to use trait objects is by passing a Box<dyn Trait> (as opposed to & dyn Trait shown above).

use std::time::Instant;

fn main() {
    let finder: Box<dyn OddFinder> = Box::new(OddFinderOne);

    let start_time = Instant::now();
    let result = find_nth_odd(finder, 1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}


fn find_nth_odd(odd_finder: Box<dyn OddFinder>, n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if odd_finder.is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

trait OddFinder {
    fn is_odd(&self, n: u64) -> bool;
}

struct OddFinderOne;

impl OddFinder for OddFinderOne {
    fn is_odd(&self, n: u64) -> bool {
        n % 2 == 1
    }
}

struct OddFinderTwo;

impl OddFinder for OddFinderTwo {
    fn is_odd(&self, _n: u64) -> bool {
        unimplemented!()
    }
}

This calculated the billionth odd number in 1314 ms.

Runtime dynamism

To this point we haven’t given the compiler any abstraction it couldn’t see through at compile time. Even though we had two OddFinder implementations, we were always hard coding the program to use OddFinderOne . We’ll change that here, to see what happens when the compiler cannot make this type of optimization.

use std::{env, time::Instant};

fn main() {
    // I don't intend to ever set this environment variable. The purpose
    // of this is only to ensure the compiler won't be able to know
    // which of the two OddFinder implementations we will use.
    let finder: Box<dyn OddFinder> = match env::var("USE_FINDER_TWO") {
        Ok(use_finder_two) if use_finder_two == String::from("True")
            => Box::new(OddFinderTwo),
        _ => Box::new(OddFinderOne),
    };

    let start_time = Instant::now();
    let result = find_nth_odd(finder, 1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}


fn find_nth_odd(odd_finder: Box<dyn OddFinder>, n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if odd_finder.is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

trait OddFinder {
    fn is_odd(&self, n: u64) -> bool;
}

struct OddFinderOne;

impl OddFinder for OddFinderOne {
    fn is_odd(&self, n: u64) -> bool {
        n % 2 == 1
    }
}

struct OddFinderTwo;

impl OddFinder for OddFinderTwo {
    fn is_odd(&self, _n: u64) -> bool {
        unimplemented!()
    }
}

As expected this version is much slower, as we can be sure dynamic dispatch is required here to determine which is_odd function to call, based on whichever OddFinder implementation is being used. This version found the billionth prime in 3424 ms.

Enum

Rather than taking in a trait object, we can modify our find_nth_odd function to take an enum containing any possible OddFinder implementation.

use std::{env, time::Instant};

fn main() {
    // I don't intend to ever set this environment variable. The purpose
    // of this is only to ensure the compiler won't be able to know
    // which of the two OddFinder implementations we will use.
    let finder: OddFinders = match env::var("USE_FINDER_TWO") {
        Ok(use_finder_two) if use_finder_two == String::from("True")
            => OddFinders::Two(OddFinderTwo),
        _ => OddFinders::One(OddFinderOne),
    };

    let start_time = Instant::now();
    let result = find_nth_odd(finder, 1_000_000_000);
    let elapsed_time = start_time.elapsed().as_millis();

    println!("{} in {} ms", result, elapsed_time);
}


fn find_nth_odd(odd_finder: OddFinders, n: u64) -> u64 {
    let mut i = 0;
    let mut odd_count = 0;

    while odd_count != n {
        i += 1;
        if odd_finder.is_odd(i) {
            odd_count += 1;
        }
    }

    i
}

trait OddFinder {
    fn is_odd(&self, n: u64) -> bool;
}

struct OddFinderOne;

impl OddFinder for OddFinderOne {
    fn is_odd(&self, n: u64) -> bool {
        n % 2 == 1
    }
}

struct OddFinderTwo;

impl OddFinder for OddFinderTwo {
    fn is_odd(&self, _n: u64) -> bool {
        unimplemented!()
    }
}

enum OddFinders {
    One(OddFinderOne),
    Two(OddFinderTwo),
}

impl OddFinder for OddFinders {
    fn is_odd(&self, n: u64) -> bool {
        match self {
            OddFinders::One(finder) => finder.is_odd(n),
            OddFinders::Two(finder) => finder.is_odd(n),
        }
    }
}

Relative to the boxed trait object, here we are trading dynamic dispatch for a runtime enum match statement. This turns out to be a big performance win, allowing this version to find the billionth prime in 1366 ms.

Inline never

For fun, I re-ran the baseline with #[inline(never)] on the is_odd function. This increased the time required to calculate the billionth odd number to 3378 ms.

Conclusion

Making performance predications in the presence of an optimizing compiler is difficult. Extrapolating performance results for one workload to predict the performance against a different workload is also challenging.

Those disclaimers aside I think there are some interesting findings here. The first is that moving a method behind a trait increases the runtime cost, even when called directly on the struct (as opposed to through a trait object). The second is that, at least in this benchmark, creating an enum of all possible trait implementors and passing that into a function is significantly (roughly 3x here) faster than using a boxed trait object.

All code is available here . If you discover issues with any of these benchmarks feel free to open issues/pull requests on GitHub.


以上所述就是小编给大家介绍的《The Cost of Indirection》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

ES6标准入门(第3版)

ES6标准入门(第3版)

阮一峰 / 电子工业出版社 / 2017-9 / 99.00

ES6是下一代JavaScript语言标准的统称,每年6月发布一次修订版,迄今为止已经发布了3个版本,分别是ES2015、ES2016、ES2017。本书根据ES2017标准,详尽介绍了所有新增的语法,对基本概念、设计目的和用法进行了清晰的讲解,给出了大量简单易懂的示例。本书为中级难度,适合那些已经对JavaScript语言有一定了解的读者,可以作为学习这门语言最新进展的工具书,也可以作为参考手册......一起来看看 《ES6标准入门(第3版)》 这本书的介绍吧!

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具